HDFStore: unable to create index, no error message · Issue #28156 · pandas-dev/pandas (original) (raw)
I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.
First, I create 2 dataframes and an HDFStore:
import pandas as pd import numpy as np df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) st = pd.HDFStore('appends.h5', mode='w')
Now, when I append, if I do:
st.append('df', df_1, data_columns=['B'], index=False) st.append('df', df_2, data_columns=['B'], index=False)
I can successfully create an index:
st.create_table_index('df', columns=['B'], optlevel=9, kind='full') st.get_storer('df').table /df/table (Table(20,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1), "B": Float64Col(shape=(), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (2730,) autoindex := True colindexes := { "B": Index(9, full, shuffle, zlib(1)).is_csi=True}
But if I instead leave out the data_columns
:
st.append('df', df_1, index=False) st.append('df', df_2, index=False)
no index is created:
st.create_table_index('df', columns=['B'], optlevel=9, kind='full') st.get_storer('df').table /df/table (Table(20,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (2730,)
This is unintuitive for 2 reasons:
- Why does HDFStore need to know the indexable columns during
append
and duringcreate_table_index
? - Why doesn't
create_table_index
raise an error message when it isn't able to create an index?
I think fixing either 1 or 2 would make things much more intuitive.