BUG: Some `Index` reductions fail with `string[pyarrow]` · Issue #51397 · pandas-dev/pandas (original) (raw)

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

x = pd.Index(['a', 'b', 'a'], dtype=object) print(f"{x.min() = }") # This works

y = pd.Index(['a', 'b', 'a'], dtype="string[python]") print(f"{y.min() = }") # This works

z = pd.Index(['a', 'b', 'a'], dtype="string[pyarrow]") print(f"{z.min() = }") # This errors

Issue Description

When an Index with string[pyarrow] data contains repeated data, for example pd.Index(['a', 'b', 'a'], dtype="string[pyarrow]"), .min() / .max() fail.

The above code snippet fails with

Traceback (most recent call last): File "/Users/james/projects/dask/dask/string-min.py", line 11, in print(f"{z.min() = }") File "/Users/james/mambaforge/envs/dask-py39/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7225, in min return self._values.min(skipna=skipna) # type: ignore[attr-defined] AttributeError: 'ArrowStringArray' object has no attribute 'min'

Expected Behavior

I'd expect using string[pyarrow] data to return the same result as object and string[python].

cc @phofl @mroeschke

Installed Versions