DEPR: Deprecate returning a DataFrame in Series.apply · Issue #52116 · pandas-dev/pandas (original) (raw)
Feature Type
- Adding new functionality to pandas
- Changing existing functionality in pandas
- Removing existing functionality in pandas
Series.apply
(more precisely SeriesApply.apply_standard
) normally returns a Series
, if supplied a callable, but if the callable returns a Series
, it converts the returned array of Series
into a Dataframe
and returns that.
small_ser = pd.Series(range(3)) small_ser.apply(lambda x: pd.Series({"x": x, "x2": x2})) x x**2 0 1 1 1 2 4 2 3 9
This approach is exceptionally slow and should be generally avoided. For example:
import pandas as pd
ser = pd.Series(range(10_000)) %timeit ser.apply(lambda x: pd.Series({"x": x, "x**2": x ** 2})) 658 ms ± 623 µs per loop
The result from the above method can in almost all cases be gotten faster by using other methods. For example can the above example be written much faster as:
%timeit ser.pipe(lambda x: pd.DataFrame({"x": x, "x2": x2})) 80.6 µs ± 682 ns per loop
In the rare case where a fast approach isn't possible, users should just construct the DataFrame manually from the list of Series instead of relying on the Series.apply
method.
Also, allowing this construct complicates the return type of Series.Apply.apply_standard
and therefore the apply
method. By removing the option to return a DataFrame
, the signature of SeriesApply.apply_standard
will be simplified into simply Series
.
All in all, this behavior is slow and makes things complicated without any benefit, so I propose to deprecate returning a DataFrame
from SeriesApply.apply_standard
.