get_dummies chokes on unicode values · Issue #6885 · pandas-dev/pandas (original) (raw)
(Context: pandas version 0.13.1 running on 2.7.6 |Anaconda 1.9.1 (64-bit)| (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
)
In my code I have a category containing lots of non-English names and want to create dummies out of it.
So I call:
dummies=pandas.get_dummies(data[cat], prefix=prefix)
and get:
c:\Anaconda\lib\site-packages\pandas\core\reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na)
971 if prefix is not None:
972 dummy_cols = ['%s%s%s' % (prefix, prefix_sep, str(v))
--> 973 for v in levels]
974 else:
975 dummy_cols = levels
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19: ordinal not in range(128)
Issue would appear to be the call to str(v)
- if v
is a unicode string with non-ascii, this is liable to explode.