PERF: masked ops for reductions (sum) by jorisvandenbossche · Pull Request #30982 · pandas-dev/pandas (original) (raw)
The current nanops has quite some complexity that is not needed for the masked arrays. This is a small proof of concept to have separate implementations for our nullable masked arrays, taking the sum case (and still ignoring the additional kwargs).
This is also quite a bit faster. On master:
In [1]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan]*1000)
In [2]: s1 = pd.Series(a, dtype="Int64")
In [3]: s2 = pd.Series(a, dtype="float64")
In [4]: %timeit s1.sum()
79.1 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit s2.sum()
79.1 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
the nullable Int64 basically does the same as the nanops implementation for float.
With this PR:
In [4]: %timeit s1.sum()
21.5 µs ± 69.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit s2.sum()
79.8 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(when using bigger arrays, the speed-up becomes less big in relative factor. Here it's almost 4x with 10k elements, but with 1M it's aound 2x).
I think it would be interesting to gradually implement some of the ops specifically for the nullable masked arrays. Personally I think it will be easier to do this in new functions than trying to fit it into the existing nanops function.