PERF: json_normalize · Issue #15621 · pandas-dev/pandas (original) (raw)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Description

I haven't looked much at the implementation, but guessing simpler cases like this could be optimized.

In [63]: data = [ ...: {'name': 'Name', ...: 'value': 1.0, ...: 'value2': 2.0, ...: 'nested': {'a': 'aa', 'b': 'bb'}}] * 1000000

In [64]: %timeit pd.DataFrame(data) 1 loop, best of 3: 847 ms per loop

In [65]: %timeit pd.io.json.json_normalize(data) 1 loop, best of 3: 20 s per loop

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: None
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: 1.5.3
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: 0.2.1