MultiIndex.isin fails in 0.20.x when values is empty iterable · Issue #16777 · pandas-dev/pandas (original) (raw)

In 0.19.x, the implementation of MultiIndex.isin allowed for an empty iterable to be passed:

In [3]: pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
Out[3]: array([False, False], dtype=bool)

However in 0.20.2 it now raises an exception:

In [1]: pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-94faf7c7f3d5> in <module>()
----> 1 pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])

/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in isin(self, values, level)
   2625         if level is None:
   2626             return algos.isin(self.values,
-> 2627                               MultiIndex.from_tuples(values).values)
   2628         else:
   2629             num = self._get_level_number(level)

/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in from_tuples(cls, tuples, sortorder, names)
   1136         if len(tuples) == 0:
   1137             # I think this is right? Not quite sure...
-> 1138             raise TypeError('Cannot infer number of levels from empty list')
   1139
   1140         if isinstance(tuples, (np.ndarray, Index)):

TypeError: Cannot infer number of levels from empty list

Problem description

This occurs because the new implementation constructs a new MultiIndex from the values parameter, and MultiIndex.from_tuples fails to generate empty MultiIndex objects (see Issue #263).

I attempted to fix this by changing the isin function:

return algos.isin(self.values,
                  MultiIndex.from_tuples(values, level=self._levels).values)

and also fixing the from_tuples function to only error out if names does not provide a way to infer the number of levels:

if len(tuples) == 0 and names is None:
    raise TypeError('Cannot infer number of levels from empty list')

However MultiIndex.from_arrays also didn't like the empty lists, and the last change is a bit ugly:

    if names is not None and len(names) > 0 and len(levels) == 0:
        levels = [[]*len(names)]
        labels = [[]*len(names)]

Happy to turn this in to a PR if there are no objections.

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.3
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.1
xarray: None
IPython: 5.3.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None