HDFStore: unable to create index, no error message · Issue #28156 · pandas-dev/pandas (original) (raw)

I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.

First, I create 2 dataframes and an HDFStore:

import pandas as pd import numpy as np df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) st = pd.HDFStore('appends.h5', mode='w')

Now, when I append, if I do:

st.append('df', df_1, data_columns=['B'], index=False) st.append('df', df_2, data_columns=['B'], index=False)

I can successfully create an index:

st.create_table_index('df', columns=['B'], optlevel=9, kind='full') st.get_storer('df').table /df/table (Table(20,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1), "B": Float64Col(shape=(), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (2730,) autoindex := True colindexes := { "B": Index(9, full, shuffle, zlib(1)).is_csi=True}

But if I instead leave out the data_columns:

st.append('df', df_1, index=False) st.append('df', df_2, index=False)

no index is created:

st.create_table_index('df', columns=['B'], optlevel=9, kind='full') st.get_storer('df').table /df/table (Table(20,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (2730,)

This is unintuitive for 2 reasons:

  1. Why does HDFStore need to know the indexable columns during append and during create_table_index?
  2. Why doesn't create_table_index raise an error message when it isn't able to create an index?

I think fixing either 1 or 2 would make things much more intuitive.