PERF: improve performance of infer_dtype by topper-123 · Pull Request #51054 · pandas-dev/pandas (original) (raw)

Rerunning the tests gives these result:

import numpy as np import pandas as pd from pandas.api.types import infer_dtype

base_arr = np.arange(10_000_000) x = np.array(base_arr) %timeit infer_dtype(x) 596 ns ± 128 ns per loop # main 146 ns ± 8.73 ns per loop # this PR, v1 140 ns ± 0.0592 ns per loop # this PR, v2 x = pd.Series(base_arr) %timeit infer_dtype(x) 6.83 µs ± 33.7 ns per loop # main 583 ns ± 5.4 ns per loop # this PR, v1 725 ns ± 5.24 ns per loop # this PR, v2 x = pd.Series(base_arr, dtype="Int32") %timeit infer_dtype(x) 3.92 µs ± 9.9 ns per loop # main 816 ns ± 4.75 ns per loop # this PR, v1 765 ns ± 4.81 ns per loop # This PR, v2 x = pd.array(base_arr) %timeit infer_dtype(x) 310 ns ± 0.704 ns per loop # main 582 ns ± 4.73 ns per loop # this PR, v1 426 ns ± 2.96 ns per loop # this PR, v2