PERF: CategoricalDtype.update_dtype by mroeschke · Pull Request #59647 · pandas-dev/pandas (original) (raw)
If a CategoricalDtype
is passed to CategoricalDtype.update_dtype
, this API will attempt to unnecessarily re-validate the categories if it was not None.
CategoricalDtype.update_dtype
is called in constructors like Categorical.__init__
and Categorical._simple_new
where there is an attempt to update the passed dtype with ordered=False
if it was None
. A fully validated CategoricalDtype
should just return itself if passed to update_dtype
In [1]: import pandas as pd
In [2]: cdtype = pd.CategoricalDtype(categories=list(range(100_000)), ordered=True)
In [3]: base_dtype = pd.CategoricalDtype(ordered=False)
In [4]: %timeit base_dtype.update_dtype(cdtype) 2.5 μs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit base_dtype.update_dtype(cdtype) 865 ns ± 2.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)