BUG/API: degenerate monotonic but not lexsorted MultiIndexes · Issue #15797 · pandas-dev/pandas (original) (raw)

after #15694. There are still some degenerate cases where even after .sort_index() a MI is not lexsorted (nor monotonic).

xref #15694 (comment)

We have a bunch of tests which assert this in test_multilevel.py

In [1]: idx = MultiIndex([['A', 'B', 'C'], ['c', 'b', 'a']],
   ...:                          [[0,1,2,0,1,2], [0,2,1,1,0,2]])
   ...: 
   ...: df = DataFrame({'col': range(len(idx))}, index=idx)
   ...: 

In [2]: df
Out[2]: 
     col
A c    0
B a    1
C b    2
A b    3
B c    4
C a    5

In [3]: df.sort_index()
Out[3]: 
     col
A b    3
  c    0
B a    1
  c    4
C a    5
  b    2

In [4]: df.sort_index().index.is_lexsorted()
Out[4]: False

In [5]: df.sort_index().index.is_monotonic
Out[5]: True

This can be fixed by the diff above. In essence, rather than .sort_index() just computing indexers off of _reconstructed index, we actually _reconnstruct the index on the returned object.

But this breaks a few things, namely some tests involving .stack() as that does .sort_index() internally. So would need to compensate. (and of course this subtley changes the actual returned values, in that they are actually sorted :>)