CLN: handle EAs and fast path (no bounds checking) in safe_sort by jorisvandenbossche · Pull Request #25696 · pandas-dev/pandas (original) (raw)
This is a possible alternative solution to what we have been discussing in #25592.
This moves the logic into safe_sort, with:
- adding a
check_outofboundskeyword to disable extra checks (otherwise the performance benefit oftake_1dis lost) - fixing
safe_sortto work for EAs
The check_outofbounds make it a bit more complicated, but without it, we can't benefit of the performance improvement for which take_1d was used originally in factorize.
(another solution is to simply decide that this performance improvement is not worth this extra code, and we simply use the current safe_sort (but fixed to work for EAs) in factorize)
Need to add some more tests for the combination of EAs with a custom na_sentinel (a case that is currently broken)