BUG: ValueError when doing HDFStore.Select of contiguous mixed-data table ft. VLArray · Issue #17021 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd
myDf = pd.DataFrame({'a' : pd.Series([1443525810,1443540836,1443609470]),
                     'b' : pd.Series(['ab','cd','ab'])})
myDf.to_hdf('test.h5', 'test')

with pd.HDFStore('test.h5') as myFile:
    df = myFile.select('/test', start=0, stop=2) # omit "start=0, stop=2" to prevent error
    display (df)

Problem description

ValueError: Shape of passed values is (2, 3), indices imply (2, 2)

Expected Output

             a   b
0   1443525810  ab
1   1443540836  cd

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Other remarks:

  def read_array(self, key, start=None, stop=None):  
  """ read an array for the specified node (off of group """  
  import tables  
  node = getattr(self.group, key)  
  attrs = node._v_attrs  
  transposed = getattr(attrs, 'transposed', False)  
  if isinstance(node, tables.VLArray):  
      ret = node[0][start:stop]  
  else:  
      dtype = getattr(attrs, 'value_type', None)  
      shape = getattr(attrs, 'shape', None)  
      if shape is not None:  
          # length 0 axis  
          ret = np.empty(shape, dtype=dtype)  
      else:  
          ret = node[start:stop]  
      if dtype == u('datetime64'):  
          # reconstruct a timezone if indicated  
          ret = _set_tz(ret, getattr(attrs, 'tz', None), coerce=True)  
      elif dtype == u('timedelta64'):  
          ret = np.asarray(ret, dtype='m8[ns]')  
  if transposed:  
      return ret.T  
  else:  
      return ret