Add Numba as an optional dependency for rolling.apply for pandas 1.0 · Issue #28987 · pandas-dev/pandas (original) (raw)

As mentioned on the pandas dev call last week, I've been working with @jreback and @DiegoAlbertoTorres on a proof of concept (POC) implementing rolling.mean and rolling.apply using Numba instead of our current Cython implementation. As described in this proof of concept document, we worked on:

  1. Refactoring window bound calculation and aggregation to use Numba
  2. Developing a new API for users to implement their own window bounds calculations (df.rolling(MyWindowerClass()).mean())

The document details performance results, high level implementation details and integration plan into pandas. The fork is up to date with master and is running the same CI checks.

The proposal for pandas 1.0 is:

1) Introduce Numba as a required pandas dependency 2) Have rolling.mean dispatch to the Numba implementation

The proposal for post pandas 1.0 is:

  1. Expose the new API for users to implement their own window bounds calculations
  2. Implement all rolling aggregations (min, max, count, etc,) in Numba
  3. Implement EWM and Expanding in Numba
  4. Generalize data grouping (groupby, rolling, resample) and aggregations (mean, max, etc) using Numba jitclasses

As the issue title notes, I'm hoping to keep this discussion focused around any concerns, thoughts, suggestions regarding the POC and how it pertains to the pandas 1.0 proposal, but feel free to ask any questions regarding topics the POC doesn't cover.

cc @pandas-dev/pandas-core