get_dummies chokes on unicode values · Issue #6885 · pandas-dev/pandas (original) (raw)

(Context: pandas version 0.13.1 running on 2.7.6 |Anaconda 1.9.1 (64-bit)| (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)])

In my code I have a category containing lots of non-English names and want to create dummies out of it.

So I call:

dummies=pandas.get_dummies(data[cat], prefix=prefix)

and get:

c:\Anaconda\lib\site-packages\pandas\core\reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na)
    971     if prefix is not None:
    972         dummy_cols = ['%s%s%s' % (prefix, prefix_sep, str(v))
--> 973                       for v in levels]
    974     else:
    975         dummy_cols = levels

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19: ordinal not in range(128)

Issue would appear to be the call to str(v) - if v is a unicode string with non-ascii, this is liable to explode.