PERF: accessing sliced indexes with populated indexing engines by topper-123 · Pull Request #51738 · pandas-dev/pandas (original) (raw)

Improves performance of indexes that are sliced from indexes with already-built indexing engines by copying the relevant data from the existing indexing engine, thereby avoiding recomputation.

Performance example:

import pandas as pd

idx = pd.Index(np.arange(1_000_000)) idx.is_unique, dx.is_monotonic_increasing # building the engine (True, True) %timeit idx[:].is_unique 13.9 ms ± 78.8 µs per loop # main 2.76 µs ± 9.74 ns per loop # this PR %timeit idx[:].is_monotonic_increasing 4.26 ms ± 1.21 µs per loop # main 2.7 µs ± 3.9 ns per loop # this PR %timeit idx[:].get_loc(999_999) 4.26 ms ± 1.49 µs per loop # main 3.77 µs ± 41.7 ns per loop # this PR

Not sure how to test this, as the relevant attributes are in cython code, but I don't think we do tests for indexing engines currently?