[PERF] Get rid of MultiIndex conversion in IntervalIndex.intersection by makbigc · Pull Request #26225 · pandas-dev/pandas (original) (raw)
A few things after poking around at this a bit.
It looks like this isn't behaving as expected for nested intervals:
In [1]: import pandas as pd; pd.version Out[1]: '0.25.0.dev0+472.ge9ddf02b5'
In [2]: ii1 = pd.IntervalIndex.from_tuples([(1, 2), (1, 3), (1, 4)])
In [3]: ii2 = pd.IntervalIndex.from_tuples([(1, 2), (1, 3)])
In [4]: ii1.intersection(ii2) Out[4]: IntervalIndex([], closed='right', dtype='interval[int64]')
Haven't looked into this enough to know exactly where it's failing.
It also looks like this fixes an issue that was present in the previous implementation of intersection
(probably a bug with MultiIndex.intersection
), specifically in relation to how duplicates are handled:
In [5]: ii1 = pd.IntervalIndex.from_tuples([(1, 2), (1, 2), (2, 3), (3, 4)])
In [6]: ii2 = pd.IntervalIndex.from_tuples([(1, 2), (2, 3)])
In [7]: ii1.intersection(ii2) Out[7]: IntervalIndex([(1, 2], (1, 2], (2, 3]], closed='right', dtype='interval[int64]')
On master this only returns one instance of (1, 2]
but returning both dupes is consistent with other index types, so I think the new behavior is correct:
In [8]: idx1 = pd.Index(list('aabc'))
In [9]: idx2 = pd.Index(list('ab'))
In [10]: idx1.intersection(idx2) Out[10]: Index(['a', 'a', 'b'], dtype='object')
It'd be nice to include both the failing and newly passing examples above as test cases. I wouldn't be opposed to creating a test_setops.py
file and moving all the setops tests there, as a similar file exists for other index types.