PERF: optimize Index.delete for dtype=object by immerrr · Pull Request #7040 · pandas-dev/pandas (original) (raw)
This should close #6933.
The downside of this patch is that we'll lose type inference that might change index type in rare occasions when index items have different dtypes. This is the same issue that occurred in #6440, and it was ruled out as infrequent and worth dropping for extra performance.
Note though, that the benchmark results are not exactly what I've expected:
In [39]: idx = tm.makeStringIndex(100000)
In [40]: timeit idx.delete(-1) 1000 loops, best of 3: 1.46 ms per loop
In [41]: timeit np.delete(idx, -1) 1000 loops, best of 3: 1.05 ms per loop
The result is faster, but not as fast as I'd expect. This is probably because of object refcounting overhead, because MultiIndex fares a lot better with its Categorial-like implementation:
In [42]: midx = pd.MultiIndex.from_product([[0], idx])
<..snip..>
In [45]: timeit midx.delete(-1) 1000 loops, best of 3: 225 µs per loop