ENH: Add numba engine to groupby.transform by mroeschke · Pull Request #32854 · pandas-dev/pandas (original) (raw)

Here the timings with the above benchmark. I'll use a modified version of it for the ASV

In [5]: In [1]: import pandas as pd
   ...:    ...: import numpy as np
   ...:    ...: import time
   ...:    ...: np.random.seed(0)
   ...:    ...: ngroups = 1000
   ...:    ...: ndays = 100000
   ...:    ...: ncols = 100
   ...:    ...: foo = pd.DataFrame(
   ...:    ...:         index=pd.date_range(start=20000101,periods=ndays,freq="D"),
   ...:    ...:         data = np.random.randn(ndays,ncols)
   ...:    ...:         )
   ...:    ...: foo[-1] = np.random.choice(ngroups,ndays)

In [6]:

In [6]: In [4]: def function(values, index, columns):
   ...:    ...:     return values * 5
   ...:
   ...:


In [7]: grouper = foo.groupby(-1)
# warm the cache
In [8]: grouper.transform(function, engine="numba")
Out[8]:
                                      0         1          2          3   ...        96        97         98        99
1970-01-01 00:00:00.020000101   8.820262  2.000786   4.893690  11.204466  ...  0.052500  8.929352   0.634560  2.009947
1970-01-02 00:00:00.020000101   9.415753 -6.738795  -6.352425   4.846984  ...  3.858953  4.117521  10.816180  6.682640
1970-01-03 00:00:00.020000101  -1.845909 -1.196896   5.498298   3.276319  ...  0.488625  2.914768  -1.997245  1.850279
1970-01-04 00:00:00.020000101  -6.532634  8.290653  -0.590820  -3.400891  ...  4.289620  5.705509   7.332894  4.262760
1970-01-05 00:00:00.020000101  -2.993270 -5.579485   3.833316   1.781464  ... -3.292765 -2.571170  -5.090209 -0.389274
...                                  ...       ...        ...        ...  ...       ...       ...        ...       ...
2243-10-12 00:00:00.020000101  -0.235677 -0.319252  10.614723  -1.871743  ...  3.524537  5.565481  -2.199342  3.493679
2243-10-13 00:00:00.020000101  -6.501683 -1.439189  -5.445545  -6.564634  ...  5.365536 -5.383367   1.147402  1.660815
2243-10-14 00:00:00.020000101  -1.894160  0.401290  -0.528430  -2.900666  ... -0.678287 -1.696137   0.421033  2.729988
2243-10-15 00:00:00.020000101   0.002150 -3.285543   1.835571   6.569671  ... -0.486593 -4.820628  -0.368741 -3.181568
2243-10-16 00:00:00.020000101  11.840371  1.316436  -5.017203   3.308539  ...  2.338079 -9.723574  -1.719926 -3.700948

[100000 rows x 100 columns]

In [9]: %timeit grouper.transform(function, engine="numba")
318 ms ± 2.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: In [9]: def function(values):
    ...:    ...:     return values * 5
    ...:

In [11]: %timeit grouper.transform(function, engine="cython")
17.8 s ± 178 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)