PERF: speed up MultiIndex.is_monotonic by 50x by qwhelan · Pull Request #27495 · pandas-dev/pandas (original) (raw)

The current logic for MultiIndex.is_monotonic relies on np.lexsort() on MultiIndex._values. While the result is cached, this is slow as it triggers the creation of ._values, needs to perform an O(n log(n)) sort, as well as populate the hashmap of a transient Index.

This PR significantly speeds up this check by directly operating on .codes when .levels are individually sorted. This means we can leverage libalgos.is_lexsorted() which is O(n) (but has the downside of needing int64 when MultiIndex compacts levels).

       before           after         ratio
     [5bd57f90]       [3320dded]
-      31.9±0.9ms         627±50μs     0.02  index_cached_properties.IndexCache.time_is_monotonic_decreasing('MultiIndex')
-      29.6±0.7ms         528±80μs     0.02  index_cached_properties.IndexCache.time_is_monotonic('MultiIndex')
-        29.9±1ms         524±90μs     0.02  index_cached_properties.IndexCache.time_is_monotonic_increasing('MultiIndex')

closes Join operation takes more time. #27099
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry