PERF: groupby.transform broadcast perf by lukemanley · Pull Request #45708 · pandas-dev/pandas (original) (raw)
groupby.transform
broadcasts user-defined functions by passing a repeated array to np.concatenate. Passing the array once to np.tile and letting numpy repeat the array is quite a bit faster.
import numpy as np
import pandas as pd
N = 1_000_000
df = pd.DataFrame(
data=np.random.rand(N, 3),
index=np.random.randint(0, 5, N)
)
%timeit df.groupby(level=0).transform(lambda x: np.max(x, axis=0))
302 ms ± 8.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- main
118 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) <- PR