pandas (original) (raw)

The groupby.transform fast path is currently only taken for DataFrame -> DataFrame operations. This PR makes use of the fast path for DataFrame -> Series operations. The performance impact is most observed for wide DataFrames. From what I can see, almost all benchmarks use tall DataFrames with only a few columns. I added an additional benchmark with a wide DataFrame to cover this case.

import numpy as np
from pandas import DataFrame

n = 1000
df = DataFrame(
    np.random.randn(n, n),
    index=np.random.choice(range(10), n),
)

%timeit df.groupby(level=0).transform(lambda x: np.max(x, axis=0))

562 ms ± 7.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <- main
76.2 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- PR