PERF: ArrowExtensionArray.factorize by mroeschke · Pull Request #49177 · pandas-dev/pandas (original) (raw)
In [1]: data = [1, 2, 3] * 5000 + [None] * 5000
In [2]: import pyarrow as pa
In [3]: arr = pd.arrays.ArrowExtensionArray(pa.array(data))
In [4]: %timeit arr.factorize() # pr
138 µs ± 795 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [4]: %timeit arr.factorize() # main
665 µs ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)