BUG: Some Index
reductions fail with string[pyarrow]
· Issue #51397 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
x = pd.Index(['a', 'b', 'a'], dtype=object) print(f"{x.min() = }") # This works
y = pd.Index(['a', 'b', 'a'], dtype="string[python]") print(f"{y.min() = }") # This works
z = pd.Index(['a', 'b', 'a'], dtype="string[pyarrow]") print(f"{z.min() = }") # This errors
Issue Description
When an Index
with string[pyarrow]
data contains repeated data, for example pd.Index(['a', 'b', 'a'], dtype="string[pyarrow]")
, .min()
/ .max()
fail.
The above code snippet fails with
Traceback (most recent call last): File "/Users/james/projects/dask/dask/string-min.py", line 11, in print(f"{z.min() = }") File "/Users/james/mambaforge/envs/dask-py39/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7225, in min return self._values.min(skipna=skipna) # type: ignore[attr-defined] AttributeError: 'ArrowStringArray' object has no attribute 'min'
Expected Behavior
I'd expect using string[pyarrow]
data to return the same result as object
and string[python]
.
cc @phofl @mroeschke
Installed Versions
INSTALLED VERSIONS
------------------
commit : 8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7
python : 3.9.15.final.0
python-bits : 64
OS : Darwin
OS-release : 22.2.0
Version : Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.2
numpy : 1.24.0
pytz : 2022.6
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.3.1
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : 2023.1.0
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2022.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.46
tables : 3.7.0
tabulate : None
xarray : 2022.9.0
xlrd : None
xlwt : None
zstandard : None
tzdata : None