Handle ExtensionArrays in cut (original) (raw)

Followup to #31290. Currently pd.cut doesn't play nicely with all extension arrays. To support them, I think we'll need one addition to the interface.

We need an array of integers to pass to searchsorted in

ids = ensure_int64(bins.searchsorted(x, side=side))

. I think the only requirement is that the integer-encoded values need to have the same ordering as the original values. (I forget the math term for this type of mapping).

It doesn't matter what value is used for missing values, as long as it's distinct.

We can't quite use factorize(arr)[0] since it doesn't have the ordering requirement.