PERF: Faster transposition of frames with masked arrays by topper-123 · Pull Request #52836 · pandas-dev/pandas (original) (raw)

Faster transpose of dataframes with homogenous masked arrays. This also helps when doing reductions with axis=1 as those currently transpose data before doing reductions.

Performance example:

In [1]: import pandas as pd, numpy as np ...: values = np.random.randn(100000, 4) ...: values = values.astype(int) ...: df = pd.DataFrame(values).astype("Int64")

%timeit df.transpose() 1.91 s ± 3.08 ms per loop # main 563 ms ± 2.48 ms per loop # this PR

Running asv continuous -f 1.1 upstream/main HEAD -b reshape.ReshapeMaskedArrayDtype.time_transpose:

       before           after         ratio
     [c3f0aac1]       [43162428]
     <master>         <cln_managers>
-     13.5±0.06ms      1.82±0.01ms     0.14  reshape.ReshapeMaskedArrayDtype.time_transpose('Float64')
-     13.8±0.03ms      1.83±0.01ms     0.13  reshape.ReshapeMaskedArrayDtype.time_transpose('Int64')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

There may be possible to improve performance when frames have a common masked dtype, I intend to look into that in a followup.