API: IntervalIndex.get_indexer not strict about passed target values dtype · Issue #47772 · pandas-dev/pandas (original) (raw)
Consider the following example of IntervalIndex with datetime64 subdtype:
In [40]: iidx = pd.IntervalIndex.from_breaks(pd.date_range("2018-01-01", periods=4))
In [41]: iidx
Out[41]: IntervalIndex([(2018-01-01, 2018-01-02], (2018-01-02, 2018-01-03], (2018-01-03, 2018-01-04]], dtype='interval[datetime64[ns], right]')
In [42]: iidx.get_indexer([pd.Timestamp("2018-01-02")])
Out[42]: array([0])
In [43]: iidx.get_indexer(["2018-01-02"])
Out[43]: array([0])
In [44]: iidx.get_indexer([pd.Timestamp("2018-01-02").value])
Out[44]: array([0])
(the above is with pandas 1.3.5, on 1.4 / main, the first two still work, but the last one not anymore)
Being able to index with strings (in addition to Timestamp / datetime64 values) is probably expected? (since that also seems to work like that for DatetimeIndex)
But we shouldn't accept integer values, I think? (this could also be deprecated first, since it also impacts behaviour of .loc
indexing)
This last case was changed (unintentionally I think, given there were no tests) in #47771, and I am changing this back in #47771 to fix a cut
regression (and implicitly also restoring the get_indexer
behaviour).
If we want to remove this again (or deprecate first), we have to change the logic inside cut
a bit to ensure we pass correctly dtyped values to IntervalIndex.get_indexer
(see explanation in top post of #47771 for context)