PERF: StringEngine for string dtype indexing ops by lukemanley · Pull Request #56997 · pandas-dev/pandas (original) (raw)
Seems to be about 2x faster than ObjectEngine for indexing ops that use the hashmap:
import pandas as pd
N = 100_000
dtype = "string[pyarrow_numpy]"
strings = [f"i-{i}" for i in range(N)]
idx1 = pd.Index(strings[10:], dtype=dtype)
idx2 = pd.Index(strings[:-10], dtype=dtype)
%timeit idx1.get_indexer_for(idx2)
# 52.6 ms ± 834 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) -> main
# 25 ms ± 790 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) -> PR