REF: Decouple Series.apply from Series.agg by topper-123 · Pull Request #53400 · pandas-dev/pandas (original) (raw)
I may not be experienced enough with the groupby and window methods to know, but I think it's worth looking into if the Series
and Dataframe
methods can have their undesired behavior deprecated using a normal process, i.e we can avoid a parallel implementation for them.
TBH, this has always been a very complex area and it was only after doing #53362 ( i.e. very recently) that I have started thinking it could be possible to do this with a normal deprecation process. I may still have missed something and be proven wrong of course, but that's part of the discussion...
The way I see it that after this PR and #53325 for example SeriesApply.agg
will be (shortened a bit):
def agg(self):
result = super().agg()
if result is None:
obj = self.obj
f = self.f
try:
result = obj.apply(f)
except (ValueError, AttributeError, TypeError):
result = f(self.obj)
else:
msg = (
f"using {f} in {type(obj).__name__}.agg cannot aggregate and "
f"has been deprecated. Use {type(obj).__name__}.transform to "
f"keep behavior unchanged."
)
warnings.warn(msg, FutureWarning, stacklevel=find_stack_level())
return result
The above just deprecates calling obj.apply(f)
and we above actually not guaranteed that f(obj)
returns a aggregated value, for example f =lambda x: np.sqrt(x)
, then agg
will return transformed values, not a aggregate.
But if we change the above to:
def agg(self):
result = super().agg()
if result is None:
obj = self.obj
f = self.f
try:
result = obj.apply(f)
except (ValueError, AttributeError, TypeError):
result = f(self.obj)
if not self._is_aggregate_value(result): # aside: how do we know something is an aggregate value?
msg = (
f"using {f} in {type(obj).__name__}.agg cannot aggregate and "
f"has been deprecated. Use {type(obj).__name__}.transform to "
f"keep behavior unchanged."
)
warnings.warn(msg, FutureWarning, stacklevel=find_stack_level())
return result
then returning a non-aggregate value will emit a warning. So the question is if the above is backward compatible and, if the user fix their code to not emit warnings, their code will be compatible with v.3.0.:
In pandas v3.0 the method will become:
def agg(self):
result = super().agg()
if result is None:
result = self.f(self.obj)
if not self._is_aggregate_value(result):
msg = (
f"using {f} in {type(obj).__name__}.agg cannot aggregate and "
f"has been deprecated. Use {type(obj).__name__}.transform to "
f"keep behavior unchanged."
)
warnings.warn(msg, FutureWarning, stacklevel=find_stack_level())
return result
The same can be done similarly with Series.transform
where we also need to check the result is a transformed result and not an aggregation or something else. To do it with Series.apply
requires a array_ops_only
parameter however.
My suspicion is that if the above can be done on Series.(agg|transform|apply)
, then DataFrame.(agg|transform|apply)
will also fall into place easily. I haven't looked enough into (Groupby|Resample|Window).(agg|transform|apply)
enough to have a clear opinion, but maybe everything will fall into place like a puzzle there also, if the Series
/DataFrame
methods can be deprecated correctly. And if not, the new Series
/DataFrame
methods can be used in a new versions of (Groupby|Resample|Window)
methods, easing their implementations?