PERF: speed up IntervalIndex._intersection_non_unique by ~50x by qwhelan · Pull Request #27489 · pandas-dev/pandas (original) (raw)

I've been backfilling asv data and noticed the following regression in IntervalIndexMethod.time_intersection_both_duplicate (see here):
Screenshot from 2019-07-20 02-30-17

This regression was missed as the benchmark was added in #26711, which was after introduction in #26225.

This PR both simplifies the IntervalIndex._intersection_non_unique logic (now equivalent to MultiIndex._intersection_non_unique) and provides a ~50x speedup:

       before           after         ratio
     [9bab81e0]       [2848036e]
     <interval_non_unique_intersection~1>       <interval_non_unique_intersection>
-      12.6±0.1ms         725±30μs     0.06  index_object.IntervalIndexMethod.time_intersection_both_duplicate(1000)
-         4.96±0s         96.7±6ms     0.02  index_object.IntervalIndexMethod.time_intersection_both_duplicate(100000)

The new numbers are about 10x faster than the old baseline.