BUG: ujson labels are encoded twice by Komnomnomnom · Pull Request #4593 · pandas-dev/pandas (original) (raw)

With its current handling ujson ends up encoding labels twice, which can cause problems if they contain escapable characters:

In [16]: df = DataFrame([['a', 'b'], ['c', 'd']], index=['index " 1', 'index / 2'], columns=['a \ b', 'y / z'])

In [17]: df Out[17]: a \ b y / z index " 1 a b index / 2 c d

In [18]: json = df.to_json()

In [19]: json Out[19]: '{"a \\\\ b":{"index \\\" 1":"a","index \\\/ 2":"c"},"y \\\/ z":{"index \\\" 1":"b","index \\\/ 2":"d"}}'

In [20]: pd.read_json(json) Out[20]: a \ b y / z index " 1 a b index / 2 c d

This PR fixes this behaviour so labels are only encoded a single time.