PERF: Faster transposition of frames with masked arrays by topper-123 · Pull Request #52836 · pandas-dev/pandas (original) (raw)
Faster transpose of dataframes with homogenous masked arrays. This also helps when doing reductions with axis=1
as those currently transpose data before doing reductions.
Performance example:
In [1]: import pandas as pd, numpy as np ...: values = np.random.randn(100000, 4) ...: values = values.astype(int) ...: df = pd.DataFrame(values).astype("Int64")
%timeit df.transpose() 1.91 s ± 3.08 ms per loop # main 563 ms ± 2.48 ms per loop # this PR
Running asv continuous -f 1.1 upstream/main HEAD -b reshape.ReshapeMaskedArrayDtype.time_transpose
:
before after ratio
[c3f0aac1] [43162428]
<master> <cln_managers>
- 13.5±0.06ms 1.82±0.01ms 0.14 reshape.ReshapeMaskedArrayDtype.time_transpose('Float64')
- 13.8±0.03ms 1.83±0.01ms 0.13 reshape.ReshapeMaskedArrayDtype.time_transpose('Int64')
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
There may be possible to improve performance when frames have a common masked dtype, I intend to look into that in a followup.