ENH: better dtype inference when doing DataFrame reductions by topper-123 · Pull Request #52788 · pandas-dev/pandas (original) (raw)
I've actually just this morning made a new version using _reduce
only (i.e. scrapping _reduce_and_wrap
)`. I prefer this new version (because we now only have one reduction methods instead of two), but if the other is preferred, it is easy to revert back.
I maintain backward compat in this new version by:
- making the
keepdim
parameter keyword only - only calling it with
keepdims=True
if the_reduce
method signature has a parameter named "keepdims" - if the
_reduce
method signature does not have a parameter named "keepdims",_reduce
gets called without supplying thekeepdims
parameter, we emit aFutureWarning
and take care of wrapping the (scalar) reduction result in a ndarray before passing it on.
This is possible because _reduce_and_wrap
was actually only called inside the blk_func
inside DataFrame._reduce
, so by doing some introspection there we can keep backward compat. See new version:
def blk_func(values, axis: Axis = 1): |
if isinstance(values, ExtensionArray): |
if not is_1d_only_ea_dtype(values.dtype) and not isinstance( |
self._mgr, ArrayManager |
): |
return values._reduce(name, axis=1, skipna=skipna, **kwds) |
sign = signature(values._reduce) |
if "keepdims" in sign.parameters: |
return values._reduce(name, skipna=skipna, keepdims=True, **kwds) |
else: |
warnings.warn( |
f"{type(values)}._reduce will require a `keepdims` parameter " |
"in the future", |
FutureWarning, |
stacklevel=find_stack_level(), |
) |
result = values._reduce(name, skipna=skipna, kwargs=kwds) |
return np.array([result]) |
else: |
return op(values, axis=axis, skipna=skipna, **kwds) |
.
Notice especially the FutureWarning starting on line 10885. This will allow us to not require keepdims
now, even though keepdims
is in the signature of ExtensionArray._reduce
. In v3.0, we will drop the signature checking and only call values._reduce
with keepdims=True
, i.e. it will fail without a keepdims
parameter in v3.0.
Check out the test_reduction_without_keepdims
test in pandas/tests/extension/decimal/test_decimal.py
for a test of what happens when extensionarrays don't have a keepdim parameter in their _reduce
method.
Thoughts? I prefer this new version, but it's is easy to revert back if needed.