BUG: read_stata ignoring encoding? (original) (raw)

I don't have time to debug right now, and maybe my expectations are just off, but it looks like read_stata doesn't respect the encoding keyword. I'm also not sure it's needed. AFAIK, Stata doesn't (and likely won't) support unicode. It always uses latin-1, so we can always use the latin-1 encoding for strings (maybe not desirable though).

https://www.dropbox.com/s/hq42trq4327ker8/encoding_issue.dta

dta = pd.read_stata("./encoding_issue.dta")
dta.head()

dta = pd.read_stata("./encoding_issue.dta", encoding="latin-1")
dta.head()

dta = pd.read_stata("./encoding_issue.dta")
dta.kreis1849.str.decode("latin-1")