BUG: iloc fails with non lex-sorted MultiIndex · Issue #13797 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd import numpy as np ind = [ ['AA','AA','AA','BB','BB'], ['A' ,'B' ,'B' ,'a' ,'b'] ] ind_nonLex = [ ['CC','CC','CC','BB','BB'], ['A' ,'B' ,'B' ,'a' ,'b'] ]
strCol=pd.DataFrame([['fooA'],['fooB'],['fooC'],['fooD'],['fooE']])
dat=np.arange(1,26).reshape(5,5) df=pd.concat([strCol, pd.DataFrame(dat)], axis=1) df1=pd.DataFrame(df.values, index=ind_nonLex) df2=pd.DataFrame(df.values, index=ind)
df1 (whose index is not lex sorted) fails with iloc access:
>>> df1.iloc[0,0]
C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\ipykernel\__main__.py:1: PerformanceWarning: indexing past lexsort depth may impact performance.
if __name__ == '__main__':
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-37-884ae2904642> in <module>()
----> 1 df1.iloc[0,0]
C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1292
1293 if type(key) is tuple:
-> 1294 return self._getitem_tuple(key)
1295 else:
1296 return self._getitem_axis(key, axis=0)
C:\Users\rikuhiro\Anaconda3\envs\pd-check\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
1561
1562 # if the dim was reduced, then pass a lower-dim the next time
-> 1563 if retval.ndim < self.ndim:
1564 axis -= 1
1565
AttributeError: 'str' object has no attribute 'ndim'
Expected Output
df2 (whose index is lex sorted) works as expected:
This attributeError does not occur when the DataFrame.values consist of numpy objects (e.g. numpy.int32) because they have the ndim attribute. (Although the performance warning remains, it may be another issue).
I found that the addition of an if statement can remedy this in pandas/core/indexing.py. This just makes the _getitem_tuple(self, tup) be aware of objects without the ndim attribute, as _getitem_nested_tuple(self, tup) is (I will prepare a pull request if it is helpful.)
@@ -1569,6 +1569,10 @@ def _getitem_tuple(self, tup): retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)
# if we have a scalar, we are done
if lib.isscalar(retval) or not hasattr(retval, 'ndim'):
break
# if the dim was reduced, then pass a lower-dim the next time if retval.ndim < self.ndim: axis -= 1
output of pd.show_versions()
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None