API: clarify DataFrame.apply reduction on empty frames by wolever · Pull Request #6007 · pandas-dev/pandas (original) (raw)

Add is_reduction argument to DataFrame.apply to avoid undefined behavior when .apply is called on an empty DataFrame when function being applied is only defined for valid inputs.

Currently, if the DataFrame is empty, a guess is made at the correct return value (either a DataFrame or a Series) by calling the function being applied with an empty Series as an argument:

if not all(self.shape): # How to determine this better? is_reduction = False try: is_reduction = not isinstance(f(_EMPTY_SERIES), Series) except Exception: pass

if is_reduction:
    return Series(NA, index=self._get_agg_axis(axis))
else:
    return self.copy()

For reduction functions which produce undefined results on unexpected input (ex, a function which doesn't expect an empty argument), this means that the the result of apply is also undefined.

This pull request adds an explicit is_reduction argument so that it's possible to explicitly control this otherwise undefined behavior.

Update: there has been the suggestion that the existing reduce argument should be used. Is this reasonable? The PR would be updated as follows:

Ref: