PERF: axis=1 reductions with EA dtypes by lukemanley · Pull Request #54341 · pandas-dev/pandas (original) (raw)

Wow - this is quite clever! On my AMD Ryzen 9 5950X machine, I get

df = pd.DataFrame(np.random.randn(10000, 4), dtype="float64[pyarrow]")
%timeit df.sum(axis=1)
698 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- main
929 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  <-- PR

df = pd.DataFrame(np.random.randn(10000, 4), dtype="Float64")
%timeit df.sum(axis=1)
401 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- main
1.16 ms ± 2.52 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  <-- PR

Also, for wide inputs, there appears to not be a slowdown

df = pd.DataFrame(np.random.randn(4, 10000), dtype="float64[pyarrow]")
%timeit df.sum(axis=1)
29 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <-- main
28.2 ms ± 440 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <-- PR

df = pd.DataFrame(np.random.randn(4, 10000), dtype="Float64")
%timeit df.sum(axis=1)
22.9 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <-- main
22.4 ms ± 273 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <-- PR

Edit:

I also ran ASVs for frame_methods and stat_ops. frame_methods showed no change, stat_ops:

       before           after         ratio
     [c93e8034]       [8335a189]
     <main>           <perf-axis1-ea-reductions>
-      4.25±0.03s       10.2±0.3ms     0.00  stat_ops.FrameOps.time_op('sum', 'Int64', 1)
-         4.24±0s       10.0±0.3ms     0.00  stat_ops.FrameOps.time_op('prod', 'Int64', 1)
-      6.85±0.06s       13.0±0.3ms     0.00  stat_ops.FrameOps.time_op('skew', 'Int64', 1)
-      7.66±0.03s       13.7±0.3ms     0.00  stat_ops.FrameOps.time_op('median', 'Int64', 1)
-      7.46±0.04s       10.6±0.3ms     0.00  stat_ops.FrameOps.time_op('var', 'Int64', 1)
-      7.61±0.04s       10.8±0.3ms     0.00  stat_ops.FrameOps.time_op('std', 'Int64', 1)