PERF: faster constructors from ea scalars by lukemanley · Pull Request #45854 · pandas-dev/pandas (original) (raw)
Perf improvement when constructing a DataFrame/Series from a scalar EA value.
import pandas as pd
N = 1_000_000
%timeit pd.DataFrame({"A": pd.NA, "B": 1.0}, index=range(N), dtype=pd.Float64Dtype())
2.18 s ± 214 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- main
22.4 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) <- PR
%timeit pd.Series(pd.NA, index=range(N), dtype=pd.Float64Dtype())
1.62 s ± 50.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- main
7.33 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) <- PR
%timeit pd.Series(1, index=range(N), dtype='Int64')
242 ms ± 21.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- main
11.6 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) <- PR