PERF: Use RangeIndex properties to compute max/min · Issue #17607 · pandas-dev/pandas (original) (raw)

Problem description

Currently RangeIndex.max and RangeIndex.min fallback to nanops.nanmax and nanops.nanmin, but it's possible to determine these more efficiently using properties of RangeIndex.

I don't imagine that these are used very frequently, but the implementation is straightforward and appears to yield a good performance boost, so seems worthwhile. Wanted to check here first before putting in too much effort though.

Did a preliminary implementation and created some asv benchmarks:

      before           after         ratio
     [b59f107a]       [85c7fef1]
-        29.3±0ms      1.74±0.06μs     0.00  index_object.Range.time_max
-        27.3±1ms      1.39±0.06μs     0.00  index_object.Range.time_min
-        24.6±0ms      1.18±0.08μs     0.00  index_object.Range.time_min_trivial
-      27.3±0.7ms      1.15±0.04μs     0.00  index_object.Range.time_max_trivial

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

Where the benchmarks are generated from:

class Range(object): goal_time = 0.2

def setup(self):
    self.idx_inc = RangeIndex(start=0, stop=10**7, step=3)
    self.idx_dec = RangeIndex(start=10**7, stop=-1, step=-3)

def time_max(self):
    self.idx_inc.max()

def time_max_trivial(self):
    self.idx_dec.max()

def time_min(self):
    self.idx_dec.min()

def time_min_trivial(self):
    self.idx_inc.min()

Note that the _trivial suffix denotes a fastpath for when the max/min are just the _start value (e.g. the minimum value of an increasing RangeIndex is just _start). This isn't necessarily the case with _stop, as it may not be included if misaligned with _step (e.g. RangeIndex(0, 10, 3)).