CLN: handle EAs and fast path (no bounds checking) in safe_sort by jorisvandenbossche · Pull Request #25696 · pandas-dev/pandas (original) (raw)

This is a possible alternative solution to what we have been discussing in #25592.

This moves the logic into safe_sort, with:

The check_outofbounds make it a bit more complicated, but without it, we can't benefit of the performance improvement for which take_1d was used originally in factorize.

(another solution is to simply decide that this performance improvement is not worth this extra code, and we simply use the current safe_sort (but fixed to work for EAs) in factorize)

Need to add some more tests for the combination of EAs with a custom na_sentinel (a case that is currently broken)