MultiIndex.isin fails in 0.20.x when values is empty iterable · Issue #16777 · pandas-dev/pandas (original) (raw)
In 0.19.x
, the implementation of MultiIndex.isin
allowed for an empty iterable to be passed:
In [3]: pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
Out[3]: array([False, False], dtype=bool)
However in 0.20.2
it now raises an exception:
In [1]: pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-57-94faf7c7f3d5> in <module>()
----> 1 pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in isin(self, values, level)
2625 if level is None:
2626 return algos.isin(self.values,
-> 2627 MultiIndex.from_tuples(values).values)
2628 else:
2629 num = self._get_level_number(level)
/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in from_tuples(cls, tuples, sortorder, names)
1136 if len(tuples) == 0:
1137 # I think this is right? Not quite sure...
-> 1138 raise TypeError('Cannot infer number of levels from empty list')
1139
1140 if isinstance(tuples, (np.ndarray, Index)):
TypeError: Cannot infer number of levels from empty list
Problem description
This occurs because the new implementation constructs a new MultiIndex
from the values
parameter, and MultiIndex.from_tuples
fails to generate empty MultiIndex
objects (see Issue #263).
I attempted to fix this by changing the isin
function:
return algos.isin(self.values,
MultiIndex.from_tuples(values, level=self._levels).values)
and also fixing the from_tuples
function to only error out if names
does not provide a way to infer the number of levels:
if len(tuples) == 0 and names is None:
raise TypeError('Cannot infer number of levels from empty list')
However MultiIndex.from_arrays
also didn't like the empty lists, and the last change is a bit ugly:
if names is not None and len(names) > 0 and len(levels) == 0:
levels = [[]*len(names)]
labels = [[]*len(names)]
Happy to turn this in to a PR if there are no objections.
Output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.20.2
pytest: 3.0.3
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.1
xarray: None
IPython: 5.3.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None