HDFStore unicode very slow for many keys · Issue #16503 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd store = pd.HDFStore('test.h5', 'w') for i in range(5000): store.put('table_{}'.format(i), pd.DataFrame([i]))

%time str(store) CPU times: user 26.1 s, sys: 156 ms, total: 26.2 s Wall time: 26.2 s

Problem description

The unicode method of HDFStore iterates over all the keys in the file to create the string representation. For larger files this operation becomes extremely slow and dumps an excessive amount of output to the console.

Worse, this completely bogs down PyCharm's debugger because it calls str(store) for every store that's in scope, on every step.

Expected Output

unicode should be a fast operation - just showing the file path would be sufficient. Detailed info on all the keys could be a separate method if needed at all.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-78-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.9
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
boto: None
pandas_datareader: None