PERF: masked ops for reductions (sum) by jorisvandenbossche · Pull Request #30982 · pandas-dev/pandas (original) (raw)

The current nanops has quite some complexity that is not needed for the masked arrays. This is a small proof of concept to have separate implementations for our nullable masked arrays, taking the sum case (and still ignoring the additional kwargs).

This is also quite a bit faster. On master:

In [1]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan]*1000)  

In [2]: s1 = pd.Series(a, dtype="Int64")

In [3]: s2 = pd.Series(a, dtype="float64") 

In [4]: %timeit s1.sum()   
79.1 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit s2.sum() 
79.1 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

the nullable Int64 basically does the same as the nanops implementation for float.
With this PR:

In [4]: %timeit s1.sum()  
21.5 µs ± 69.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit s2.sum() 
79.8 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

(when using bigger arrays, the speed-up becomes less big in relative factor. Here it's almost 4x with 10k elements, but with 1M it's aound 2x).

I think it would be interesting to gradually implement some of the ops specifically for the nullable masked arrays. Personally I think it will be easier to do this in new functions than trying to fit it into the existing nanops function.