PERF: speed up MultiIndex.is_monotonic by 50x by qwhelan · Pull Request #27495 · pandas-dev/pandas (original) (raw)
The current logic for MultiIndex.is_monotonic
relies on np.lexsort()
on MultiIndex._values
. While the result is cached, this is slow as it triggers the creation of ._values
, needs to perform an O(n log(n))
sort, as well as populate the hashmap of a transient Index
.
This PR significantly speeds up this check by directly operating on .codes
when .levels
are individually sorted. This means we can leverage libalgos.is_lexsorted()
which is O(n)
(but has the downside of needing int64
when MultiIndex
compacts levels).
before after ratio
[5bd57f90] [3320dded]
- 31.9±0.9ms 627±50μs 0.02 index_cached_properties.IndexCache.time_is_monotonic_decreasing('MultiIndex')
- 29.6±0.7ms 528±80μs 0.02 index_cached_properties.IndexCache.time_is_monotonic('MultiIndex')
- 29.9±1ms 524±90μs 0.02 index_cached_properties.IndexCache.time_is_monotonic_increasing('MultiIndex')
- closes Join operation takes more time. #27099
- tests added / passed
- passes
black pandas
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry