BUG: join operation fails on overlapping IntervalIndex levels · Issue #45661 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd

range_index = pd.RangeIndex(3, name="range_index")

interval_index = pd.IntervalIndex.from_tuples([ (0.0, 1.0), (1.0, 2.0), (1.5, 2.5) ], name='interval_index')

multi_index = pd.MultiIndex.from_product([interval_index, range_index])

print(interval_index.join(multi_index))

This causes the same issue

print(multi_index.join(interval_index))

Issue Description

Observed output:

Traceback (most recent call last):
  File "/home/jmu3si/tmp/join_index_flipped.py", line 11, in <module>
    print(interval_index.join(multi_index))
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 216, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4368, in join
    return self._join_multi(other, how=how)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4531, in _join_multi
    result = self._join_level(other, level, how=how)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4633, in _join_level
    new_level, left_lev_indexer, right_lev_indexer = old_level.join(
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 216, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4426, in join
    return self._join_via_get_indexer(other, how, sort)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4456, in _join_via_get_indexer
    lindexer = self.get_indexer(join_index)
  File "/home/jmu3si/miniconda3/envs/myroot/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3721, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: cannot handle overlapping indices; use IntervalIndex.get_indexer_non_unique

The join operation fails, because the get_indexer() fails due to overlapping intervals. It is very similar to #44096. The difference is probably that in here it is not two MultiIndexs that we are trying to join.

Expected Behavior

Expected output:

MultiIndex([((0.0, 1.0], 0),
            ((0.0, 1.0], 1),                                                                                                                  ((0.0, 1.0], 2),
            ((1.0, 2.0], 0),                                                                                                                  ((1.0, 2.0], 1),
            ((1.0, 2.0], 2),                                                                                                                  ((1.5, 2.5], 0),
            ((1.5, 2.5], 1),                                                                                                                  ((1.5, 2.5], 2)],
           names=['interval_index', 'range_index'])
MultiIndex([((0.0, 1.0], 0),
            ((0.0, 1.0], 1),                                                                                                                  ((0.0, 1.0], 2),
            ((1.0, 2.0], 0),
            ((1.0, 2.0], 1),                                                                                                                  ((1.0, 2.0], 2),
            ((1.5, 2.5], 0),                                                                                                                  ((1.5, 2.5], 1),
            ((1.5, 2.5], 2)],
           names=['interval_index', 'range_index'])

Installed Versions

INSTALLED VERSIONS ------------------ commit : bb1f651python : 3.9.7.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-96-lowlatency Version : #109-Ubuntu SMP PREEMPT Wed Jan 12 17:51:01 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.4.0
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : None
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.20.1
xlrd : None
xlwt : None
zstandard : None