BUG/API: degenerate monotonic but not lexsorted MultiIndexes · Issue #15797 · pandas-dev/pandas (original) (raw)
after #15694. There are still some degenerate cases where even after .sort_index()
a MI is not lexsorted (nor monotonic).
xref #15694 (comment)
We have a bunch of tests which assert this in test_multilevel.py
In [1]: idx = MultiIndex([['A', 'B', 'C'], ['c', 'b', 'a']],
...: [[0,1,2,0,1,2], [0,2,1,1,0,2]])
...:
...: df = DataFrame({'col': range(len(idx))}, index=idx)
...:
In [2]: df
Out[2]:
col
A c 0
B a 1
C b 2
A b 3
B c 4
C a 5
In [3]: df.sort_index()
Out[3]:
col
A b 3
c 0
B a 1
c 4
C a 5
b 2
In [4]: df.sort_index().index.is_lexsorted()
Out[4]: False
In [5]: df.sort_index().index.is_monotonic
Out[5]: True
This can be fixed by the diff above. In essence, rather than .sort_index()
just computing indexers off of _reconstructed index, we actually _reconnstruct the index on the returned object.
But this breaks a few things, namely some tests involving .stack()
as that does .sort_index()
internally. So would need to compensate. (and of course this subtley changes the actual returned values, in that they are actually sorted :>)