json_normalize() can't deal with non-ascii characters in unicode keys · Issue #13213 · pandas-dev/pandas (original) (raw)
Example code:
import pandas import json
testjson = u''' [{"Ünicøde":0,"sub":{"A":1, "B":2}}, {"Ünicøde":1,"sub":{"A":3, "B":4}}] '''.encode('utf8') pd.io.json.json_normalize(json.loads(testjson))
Output:
Traceback (most recent call last):
File "...lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-12-f866f9c7ec7c>", line 5, in <module>
pd.io.json.json_normalize(json.loads(testjson))
File ".../lib/python2.7/site-packages/pandas/io/json.py", line 715, in json_normalize
data = nested_to_record(data)
File ".../lib/python2.7/site-packages/pandas/io/json.py", line 617, in nested_to_record
newkey = str(k)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 0: ordinal not in range(128)
Expected output
sub.A sub.B Ünicøde
0 1 2 0
1 3 4 1
The cause are probably
https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L618
and https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L620
Those lines seemingly were introduced to deal with numeric types, but fail when k
is a Unicode object containing non-ascii characters.
It seems to be the same bug in principle as #13101