ENH/PERF: enable column-wise reductions for EA-backed columns by jorisvandenbossche · Pull Request #32867 · pandas-dev/pandas (original) (raw)

Currently, for reductions on a DataFrame, we convert the full DataFrame to a single "interleaved" array and then perform the operation. That's the default, but when numeric_only=True is specified, it is done block-wise.

Enabling column-wise reductions (or block-wise for EAs):

For illustration purposes, I added a column_wise keyword in this PR (not meant to keep this, just for testing), so we can compare a few cases:

In [9]: df_wide = pd.DataFrame(np.random.randint(1000, size=(1000,100))).astype("Int64").copy() 

In [15]: %timeit df_wide.mean()   
9.68 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [16]: %timeit df_wide.mean(numeric_only=True)
10.1 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: %timeit df_wide.mean(column_wise=True)  
5.22 ms ± 29.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]: df_long = pd.DataFrame(np.random.randint(1000, size=(10000,10))).astype("Int64").copy()  

In [19]: %timeit df_long.mean()
7.77 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: %timeit df_long.mean(numeric_only=True)         
2.07 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [21]: %timeit df_long.mean(column_wise=True) 
1.04 ms ± 4.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So I experimented with two approaches:

The first gives better performance (it is simpler in implementation by not involding the blocks), but requires some more new code (it uses less the existing machinery).

Ideally, for EA columns, we should always use their own reduction implementation (thus call EA._reduce), I think. So for both approaches, the question will be how to trigger this behaviour.

Closes #32651, closes #34520