ENH: Add numba engine to several rolling aggregations by mroeschke · Pull Request #38895 · pandas-dev/pandas (original) (raw)

Hi @mroeschke — not sure this is the best forum, but saw this in the release notes and wanted to confirm whether my read of the code was correct.

I'm one of the developers of numbagg, and xarray (in which we use numbagg). (And in the olden days, I had occasional pandas contributions.)

IIUC, a rolling numba mean in pandas will call generate_numba_apply_func with func as np.mean. And then that will calculate a window on each step, and apply the numbagg function over the whole window at each step.

Is that correct? Do you have any thoughts on the relative efficiency of that vs. a "rolling algo"; i.e. one that keeps a running sum & count, adding one new value and subtracting one existing value at each step? I had thought that a rolling algo would be significantly faster — particularly for large windows — but I haven't tested it and perhaps you considered this already?

IIUC, the cython functions in pandas are rolling algos. And here's an example of that implemented with numba in numbagg: https://github.com/numbagg/numbagg/blob/v0.2.1/numbagg/moving.py#L85

Thanks in advance, and congrats on getting this into pandas!