HDF5 index corruption · Issue #8265 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

@rockg

Description

@rockg

I generated a multindexed DataFrame and wrote it to hdf5 using to_hdf. It uses zlib level 5 compression. The file was written all at once. The file is located here: https://www.dropbox.com/s/122q55g5ubcf4fl/indexIssue.h5?dl=0

The below methods should be identical but the former select with a where clause has 2892 records but getting all values and subselecting on the path returns 2972 (values are missing for path 6 between 3-5-2015 20:00 to 3-6-2015 9:00). I tried using reindex on the able but that didn't fix anything. I don't really know what's going on.

store   =   HDFStore(path_to_file, mode='r')

p1      =   store.select('ts', where=Term('Path', '=', 6), auto_close=False)
print(len(p1))
p2      =   store.select('ts', auto_close=False)
p2s     =   p2[p2.index.get_level_values('Path') == 6]
print(len(p2s))