API: clarify DataFrame.apply reduction on empty frames by wolever · Pull Request #6007 · pandas-dev/pandas (original) (raw)
Add is_reduction
argument to DataFrame.apply
to avoid undefined behavior when .apply
is called on an empty DataFrame when function being applied is only defined for valid inputs.
Currently, if the DataFrame is empty, a guess is made at the correct return value (either a DataFrame
or a Series
) by calling the function being applied with an empty Series as an argument:
if not all(self.shape): # How to determine this better? is_reduction = False try: is_reduction = not isinstance(f(_EMPTY_SERIES), Series) except Exception: pass
if is_reduction:
return Series(NA, index=self._get_agg_axis(axis))
else:
return self.copy()
For reduction functions which produce undefined results on unexpected input (ex, a function which doesn't expect an empty argument), this means that the the result of apply
is also undefined.
This pull request adds an explicit is_reduction
argument so that it's possible to explicitly control this otherwise undefined behavior.
Update: there has been the suggestion that the existing reduce
argument should be used. Is this reasonable? The PR would be updated as follows:
- Remove the
is_reduction
argument - Change the default value of
reduce
fromTrue
toNone
(to preserve the current behavior of checking the return value of the function being applied) - In the case of an empty DataFrame, treat
reduce
in the same way that I'm currently treatingis_reduction
- Otherwise treat
reduce
as normal
Ref: