REGR: fix pd.cut with datetime IntervalIndex as bins by jorisvandenbossche · Pull Request #47771 · pandas-dev/pandas (original) (raw)
Closes #46218
xref #46218 (comment) for the actual reason that causes cut
to fail: inside the implementation, we convert the actual timestamp data to floats (to pass to the underlying algorithm), but then when passing those values to IntervalIndex.get_indexer
, those numeric values no longer "match" with the datetime64 interval dtype. And in theory, get_indexer
should then fail (return -1 for not finding the target values), but until #42227 this actually happily worked (and therefore also let cut
with datetime64 interval bins work).
This PR doesn't solve the root cause (we should change the logic inside cut
so that we don't create this mismatch in values vs bins), but it is a short-term fix of the regression. It basically reverts the (unintended, I think) behaviour change introduced by #42227, but without actually reverting that PR (I am keeping the refactor introducing _should_partial_index
of that PR, but I am only changing _should_partial_index
itself a little bit to match better with what happened before in practice).
I will open a separate issue about the issue in cut
and that we should make IntervalIndex.from_indexer
more strict. (opened -> #47772)