Support writing unicode characters in df.to_stata() (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd df = pd.DataFrame({'a': ['丆']}) df.to_stata('test.dta')

UnicodeEncodeError: 'latin-1' codec can't encode character '\u4e06' in position 0: ordinal not in range(256)

I picked an arbitrary CJK character to test this with.

Problem description

It would be possible to write Unicode strings to a Stata file by implementing a writer according to version 118 of the dta format.

~~I'd be interested in trying to submit a PR for this.~~ (Edit: I don't use Stata anymore)

Expected Output

Stata file written to disk.

Output of `pd.show_versions()`

Details