PERF: axis=1 reductions with EA dtypes by lukemanley · Pull Request #54341 · pandas-dev/pandas (original) (raw)
Wow - this is quite clever! On my AMD Ryzen 9 5950X
machine, I get
df = pd.DataFrame(np.random.randn(10000, 4), dtype="float64[pyarrow]")
%timeit df.sum(axis=1)
698 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- main
929 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) <-- PR
df = pd.DataFrame(np.random.randn(10000, 4), dtype="Float64")
%timeit df.sum(axis=1)
401 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- main
1.16 ms ± 2.52 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) <-- PR
Also, for wide inputs, there appears to not be a slowdown
df = pd.DataFrame(np.random.randn(4, 10000), dtype="float64[pyarrow]")
%timeit df.sum(axis=1)
29 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <-- main
28.2 ms ± 440 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <-- PR
df = pd.DataFrame(np.random.randn(4, 10000), dtype="Float64")
%timeit df.sum(axis=1)
22.9 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <-- main
22.4 ms ± 273 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <-- PR
Edit:
I also ran ASVs for frame_methods
and stat_ops
. frame_methods
showed no change, stat_ops
:
before after ratio
[c93e8034] [8335a189]
<main> <perf-axis1-ea-reductions>
- 4.25±0.03s 10.2±0.3ms 0.00 stat_ops.FrameOps.time_op('sum', 'Int64', 1)
- 4.24±0s 10.0±0.3ms 0.00 stat_ops.FrameOps.time_op('prod', 'Int64', 1)
- 6.85±0.06s 13.0±0.3ms 0.00 stat_ops.FrameOps.time_op('skew', 'Int64', 1)
- 7.66±0.03s 13.7±0.3ms 0.00 stat_ops.FrameOps.time_op('median', 'Int64', 1)
- 7.46±0.04s 10.6±0.3ms 0.00 stat_ops.FrameOps.time_op('var', 'Int64', 1)
- 7.61±0.04s 10.8±0.3ms 0.00 stat_ops.FrameOps.time_op('std', 'Int64', 1)