PERF: ArrowExtensionArray.factorize by mroeschke · Pull Request #49177 · pandas-dev/pandas (original) (raw)

In [1]: data = [1, 2, 3] * 5000 + [None] * 5000

In [2]: import pyarrow as pa

In [3]: arr = pd.arrays.ArrowExtensionArray(pa.array(data))

In [4]: %timeit arr.factorize()  # pr
138 µs ± 795 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit arr.factorize() # main
665 µs ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)