PERF: Index.getitem performance issue · Issue #6370 · pandas-dev/pandas (original) (raw)
Once again, caused by #6328 investigation.
There's something very strange with how Index
objects handle slices:
In [1]: import pandas.util.testing as tm
In [2]: idx = tm.makeStringIndex(1000000)
In [3]: timeit idx[:-1] 100000 loops, best of 3: 2 µs per loop
In [4]: timeit idx[slice(None,-1)] 100 loops, best of 3: 6.5 ms per loop
Obviously, this happens because Index
doesn't override __getslice__
provided by ndarray
, hence idx[:-1]
is executed via ndarray.__getslice__
-> Index.__array_finalize__
and idx[slice(None, -1)]
goes via Index.__getitem__
-> Index.__new__
.
__getitem__
is made 1000x slower trying to infer slice data type and convert it to a different subclass. The problem is that interactive invocation idx[:-1]
, which is when that milliseconds-vs-microseconds issue doesn't matter, is likely to miss this feature, because it's dispatched via __getslice__
. But for programmatic invocation idx[slice(None, -1)]
which hits this soft spot, I'd argue that this type conversion magic is not at all necessary.
Is there a rationale behind this?