ENH: Add numba engine to groupby.transform by mroeschke · Pull Request #32854 · pandas-dev/pandas (original) (raw)
Here the timings with the above benchmark. I'll use a modified version of it for the ASV
In [5]: In [1]: import pandas as pd
...: ...: import numpy as np
...: ...: import time
...: ...: np.random.seed(0)
...: ...: ngroups = 1000
...: ...: ndays = 100000
...: ...: ncols = 100
...: ...: foo = pd.DataFrame(
...: ...: index=pd.date_range(start=20000101,periods=ndays,freq="D"),
...: ...: data = np.random.randn(ndays,ncols)
...: ...: )
...: ...: foo[-1] = np.random.choice(ngroups,ndays)
In [6]:
In [6]: In [4]: def function(values, index, columns):
...: ...: return values * 5
...:
...:
In [7]: grouper = foo.groupby(-1)
# warm the cache
In [8]: grouper.transform(function, engine="numba")
Out[8]:
0 1 2 3 ... 96 97 98 99
1970-01-01 00:00:00.020000101 8.820262 2.000786 4.893690 11.204466 ... 0.052500 8.929352 0.634560 2.009947
1970-01-02 00:00:00.020000101 9.415753 -6.738795 -6.352425 4.846984 ... 3.858953 4.117521 10.816180 6.682640
1970-01-03 00:00:00.020000101 -1.845909 -1.196896 5.498298 3.276319 ... 0.488625 2.914768 -1.997245 1.850279
1970-01-04 00:00:00.020000101 -6.532634 8.290653 -0.590820 -3.400891 ... 4.289620 5.705509 7.332894 4.262760
1970-01-05 00:00:00.020000101 -2.993270 -5.579485 3.833316 1.781464 ... -3.292765 -2.571170 -5.090209 -0.389274
... ... ... ... ... ... ... ... ... ...
2243-10-12 00:00:00.020000101 -0.235677 -0.319252 10.614723 -1.871743 ... 3.524537 5.565481 -2.199342 3.493679
2243-10-13 00:00:00.020000101 -6.501683 -1.439189 -5.445545 -6.564634 ... 5.365536 -5.383367 1.147402 1.660815
2243-10-14 00:00:00.020000101 -1.894160 0.401290 -0.528430 -2.900666 ... -0.678287 -1.696137 0.421033 2.729988
2243-10-15 00:00:00.020000101 0.002150 -3.285543 1.835571 6.569671 ... -0.486593 -4.820628 -0.368741 -3.181568
2243-10-16 00:00:00.020000101 11.840371 1.316436 -5.017203 3.308539 ... 2.338079 -9.723574 -1.719926 -3.700948
[100000 rows x 100 columns]
In [9]: %timeit grouper.transform(function, engine="numba")
318 ms ± 2.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [10]: In [9]: def function(values):
...: ...: return values * 5
...:
In [11]: %timeit grouper.transform(function, engine="cython")
17.8 s ± 178 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)