ENH/PERF: pyarrow timestamp & duration conversion consistency/performance by lukemanley · Pull Request #53326 · pandas-dev/pandas (original) (raw)
Submitting as a single PR since there are a number of tests that require consistency across these methods and trying to split the nano/non-nano behavior from the performance improvements is tricky.
import pandas as pd
import pyarrow as pa
N = 1_000_000
arr = pd.array(range(N), dtype=pd.ArrowDtype(pa.timestamp("s")))
%timeit arr.astype("M8[s]")
# 5.29 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -> main
# 137 µs ± 4.88 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) -> PR
%timeit pd.DatetimeIndex(arr)
# 6.31 s ± 560 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -> main
# 67.6 µs ± 3.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) -> pr