CLN: handle EAs and fast path (no bounds checking) in safe_sort by jorisvandenbossche · Pull Request #25696 · pandas-dev/pandas (original) (raw)
This is a possible alternative solution to what we have been discussing in #25592.
This moves the logic into safe_sort
, with:
- adding a
check_outofbounds
keyword to disable extra checks (otherwise the performance benefit oftake_1d
is lost) - fixing
safe_sort
to work for EAs
The check_outofbounds
make it a bit more complicated, but without it, we can't benefit of the performance improvement for which take_1d
was used originally in factorize
.
(another solution is to simply decide that this performance improvement is not worth this extra code, and we simply use the current safe_sort
(but fixed to work for EAs) in factorize
)
Need to add some more tests for the combination of EAs with a custom na_sentinel (a case that is currently broken)