ENH: Support min/max on ArrowStringArray · Issue #42597 · pandas-dev/pandas (original) (raw)

In order for Dask to perform large shuffles (set_index, join on a non-index column, ...) on a column it needs to be able to compute quantiles.

To do this it is useful to compute min/max values.

When I try to do this on columns of type string[pyarrow] I get the following exception

~/miniconda/lib/python3.8/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs) 10825 ) 10826 def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):

10827 return NDFrame.min(self, axis, skipna, level, numeric_only, **kwargs) 10828 10829 setattr(cls, "min", min)

~/miniconda/lib/python3.8/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs) 10348 10349 def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):

10350 return self._stat_function( 10351 "min", nanops.nanmin, axis, skipna, level, numeric_only, **kwargs 10352 )

~/miniconda/lib/python3.8/site-packages/pandas/core/generic.py in _stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs) 10343 name, axis=axis, level=level, skipna=skipna, numeric_only=numeric_only 10344 )

10345 return self._reduce( 10346 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only 10347 )

~/miniconda/lib/python3.8/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) 4380 if isinstance(delegate, ExtensionArray): 4381 # dispatch to ExtensionArray interface -> 4382 return delegate._reduce(name, skipna=skipna, **kwds) 4383 4384 else:

~/miniconda/lib/python3.8/site-packages/pandas/core/arrays/string_arrow.py in _reduce(self, name, skipna, **kwargs) 377 def _reduce(self, name: str, skipna: bool = True, **kwargs): 378 if name in ["min", "max"]: --> 379 return getattr(self, name)(skipna=skipna) 380 381 raise TypeError(f"Cannot perform reduction '{name}' with string dtype")

AttributeError: 'ArrowStringArray' object has no attribute 'min'

I am hopeful that Arrow maybe already has an min/max implementation and they just haven't been hooked up yet.