PERF: accessing sliced indexes with populated indexing engines by topper-123 · Pull Request #51738 · pandas-dev/pandas (original) (raw)
Improves performance of indexes that are sliced from indexes with already-built indexing engines by copying the relevant data from the existing indexing engine, thereby avoiding recomputation.
Performance example:
import pandas as pd
idx = pd.Index(np.arange(1_000_000)) idx.is_unique, dx.is_monotonic_increasing # building the engine (True, True) %timeit idx[:].is_unique 13.9 ms ± 78.8 µs per loop # main 2.76 µs ± 9.74 ns per loop # this PR %timeit idx[:].is_monotonic_increasing 4.26 ms ± 1.21 µs per loop # main 2.7 µs ± 3.9 ns per loop # this PR %timeit idx[:].get_loc(999_999) 4.26 ms ± 1.49 µs per loop # main 3.77 µs ± 41.7 ns per loop # this PR
Not sure how to test this, as the relevant attributes are in cython code, but I don't think we do tests for indexing engines currently?