BUG: iloc fails with non lex-sorted MultiIndex · Issue #13797 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd import numpy as np ind = [ ['AA','AA','AA','BB','BB'], ['A' ,'B' ,'B' ,'a' ,'b'] ] ind_nonLex = [ ['CC','CC','CC','BB','BB'], ['A' ,'B' ,'B' ,'a' ,'b'] ]

strCol=pd.DataFrame([['fooA'],['fooB'],['fooC'],['fooD'],['fooE']])

dat=np.arange(1,26).reshape(5,5) df=pd.concat([strCol, pd.DataFrame(dat)], axis=1) df1=pd.DataFrame(df.values, index=ind_nonLex) df2=pd.DataFrame(df.values, index=ind)

df1 (whose index is not lex sorted) fails with iloc access:

>>> df1.iloc[0,0]
C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\ipykernel\__main__.py:1: PerformanceWarning: indexing past lexsort depth may impact performance.
  if __name__ == '__main__':
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-884ae2904642> in <module>()
----> 1 df1.iloc[0,0]

C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1292 
   1293         if type(key) is tuple:
-> 1294             return self._getitem_tuple(key)
   1295         else:
   1296             return self._getitem_axis(key, axis=0)

C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
   1561 
   1562             # if the dim was reduced, then pass a lower-dim the next time
-> 1563             if retval.ndim < self.ndim:
   1564                 axis -= 1
   1565 

AttributeError: 'str' object has no attribute 'ndim'

Expected Output

df2 (whose index is lex sorted) works as expected:

This attributeError does not occur when the DataFrame.values consist of numpy objects (e.g. numpy.int32) because they have the ndim attribute. (Although the performance warning remains, it may be another issue).

I found that the addition of an if statement can remedy this in pandas/core/indexing.py. This just makes the _getitem_tuple(self, tup) be aware of objects without the ndim attribute, as _getitem_nested_tuple(self, tup) is (I will prepare a pull request if it is helpful.)

@@ -1569,6 +1569,10 @@ def _getitem_tuple(self, tup): retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)

output of pd.show_versions()

>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None