pandas (original) (raw)

Noticed in #59758 (comment)

This is partially fixing a bug, because currently for cases where we actually need a float result but try to convert it back to an int, we get an error:

In [58]: pd.Series([2, 1, 1]).rank() Out[58]: 0 3.0 1 1.5 2 1.5 dtype: float64

In [59]: pd.Series(["2", "1", "1"], dtype="string[pyarrow]").rank() ... File ~/scipy/repos/pandas/pandas/core/arrays/string_arrow.py:445, in _convert_int_result(self, result)

File ~/scipy/repos/pandas/pandas/core/arrays/numeric.py:93, in NumericDtype.from_arrow(self, array) ---> 93 array = array.cast(pyarrow_type) ... ArrowInvalid: Float value 1.5 was truncated converting to int64

But in general we should also decide what to do return here. For our default dtypes, it seems we decided in the past to simply always return float64, even for rank methods that could return ints.
For the ArrowDtype, then, it was decided to keep the dtype returned by pyarrow: #50264

For now, this PR updates StringDtype to simply always return float64, to be consistent between the pyarrow vs python storage. But we could also consider, for those newer dtypes, to actually keep the distinction between float/int results (just not always int like it is done now).