ENH: better dtype inference when doing DataFrame reductions by topper-123 · Pull Request #52788 · pandas-dev/pandas (original) (raw)

I've actually just this morning made a new version using _reduce only (i.e. scrapping _reduce_and_wrap)`. I prefer this new version (because we now only have one reduction methods instead of two), but if the other is preferred, it is easy to revert back.

I maintain backward compat in this new version by:

  1. making the keepdim parameter keyword only
  2. only calling it with keepdims=True if the _reduce method signature has a parameter named "keepdims"
  3. if the _reduce method signature does not have a parameter named "keepdims", _reduce gets called without supplying the keepdims parameter, we emit a FutureWarning and take care of wrapping the (scalar) reduction result in a ndarray before passing it on.

This is possible because _reduce_and_wrap was actually only called inside the blk_func inside DataFrame._reduce, so by doing some introspection there we can keep backward compat. See new version:

def blk_func(values, axis: Axis = 1):
if isinstance(values, ExtensionArray):
if not is_1d_only_ea_dtype(values.dtype) and not isinstance(
self._mgr, ArrayManager
):
return values._reduce(name, axis=1, skipna=skipna, **kwds)
sign = signature(values._reduce)
if "keepdims" in sign.parameters:
return values._reduce(name, skipna=skipna, keepdims=True, **kwds)
else:
warnings.warn(
f"{type(values)}._reduce will require a `keepdims` parameter "
"in the future",
FutureWarning,
stacklevel=find_stack_level(),
)
result = values._reduce(name, skipna=skipna, kwargs=kwds)
return np.array([result])
else:
return op(values, axis=axis, skipna=skipna, **kwds)

.

Notice especially the FutureWarning starting on line 10885. This will allow us to not require keepdims now, even though keepdims is in the signature of ExtensionArray._reduce. In v3.0, we will drop the signature checking and only call values._reduce with keepdims=True, i.e. it will fail without a keepdims parameter in v3.0.

Check out the test_reduction_without_keepdims test in pandas/tests/extension/decimal/test_decimal.py for a test of what happens when extensionarrays don't have a keepdim parameter in their _reduce method.

Thoughts? I prefer this new version, but it's is easy to revert back if needed.