PERF: Use RangeIndex properties to compute max/min · Issue #17607 · pandas-dev/pandas (original) (raw)
Problem description
Currently RangeIndex.max
and RangeIndex.min
fallback to nanops.nanmax
and nanops.nanmin
, but it's possible to determine these more efficiently using properties of RangeIndex
.
I don't imagine that these are used very frequently, but the implementation is straightforward and appears to yield a good performance boost, so seems worthwhile. Wanted to check here first before putting in too much effort though.
Did a preliminary implementation and created some asv benchmarks:
before after ratio
[b59f107a] [85c7fef1]
- 29.3±0ms 1.74±0.06μs 0.00 index_object.Range.time_max
- 27.3±1ms 1.39±0.06μs 0.00 index_object.Range.time_min
- 24.6±0ms 1.18±0.08μs 0.00 index_object.Range.time_min_trivial
- 27.3±0.7ms 1.15±0.04μs 0.00 index_object.Range.time_max_trivial
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
Where the benchmarks are generated from:
class Range(object): goal_time = 0.2
def setup(self):
self.idx_inc = RangeIndex(start=0, stop=10**7, step=3)
self.idx_dec = RangeIndex(start=10**7, stop=-1, step=-3)
def time_max(self):
self.idx_inc.max()
def time_max_trivial(self):
self.idx_dec.max()
def time_min(self):
self.idx_dec.min()
def time_min_trivial(self):
self.idx_inc.min()
Note that the _trivial
suffix denotes a fastpath for when the max/min are just the _start
value (e.g. the minimum value of an increasing RangeIndex
is just _start
). This isn't necessarily the case with _stop
, as it may not be included if misaligned with _step
(e.g. RangeIndex(0, 10, 3)
).