pandas-dev/pandas (original) (raw)
Code Sample
r = list(range(999999)) df2 = pd.DataFrame({'a':r, 'b':r}, index=pd.MultiIndex.from_tuples([(x,x) for x in r])) df2['a'].blabla()
result as expected: AttributeError: 'Series' object has no attribute 'blabla'
r = list(range(1000000)) df2 = pd.DataFrame({'a':r, 'b':r}, index=pd.MultiIndex.from_tuples([(x,x) for x in r])) df2['a'].blabla() #strange result: TypeError: unorderable types: numpy.ndarray() < str()
from scipy.stats.mstats import winsorize winsorize(df2['a']) #result: TypeError: unorderable types: numpy.ndarray() < str()
Problem description
When there are exactly one million records or more with a MultiIndex, I get this TypeError when I call a non-existent method on a column. I get the same error when I try to call the winsorize function from scipy.stats on a column. The type error originates here: File "pandas/index.pyx", line 481, in pandas.index._bin_search
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-98-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: nb_NO.UTF-8
LOCALE: nb_NO.UTF-8
pandas: 0.21.0
pytest: 3.2.2
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: 0.9.6
IPython: 2.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.4
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None