sort_index fails on MutiIndex'ed DataFrame resulting from groupby.apply · Issue #15687 · pandas-dev/pandas (original) (raw)

I feel like this is part of a suite of bugs that come from a failure to notice when a MultiIndex that was once lexsorted loses its lexsortedness. I submitted one example of this a couple years ago (see #8017), but this related bug persists.

On Pandas 0.19.0:

import numpy as np
import pandas as pd

np.random.seed(0)

df = pd.DataFrame(
    np.random.randn(8, 2),
    index=pd.MultiIndex.from_product([['a', 'b'], ['big', 'small'], ['red', 'blue']], names=['letter', 'size', 'color']),
    columns=['near', 'far']
)
df = df.sort_index()

def my_func(group):
    group.index = ['newz', 'newa']
    return group

res = df.groupby(level=['letter', 'size']).apply(my_func).sort_index()

print res
###OUTPUT###
                       near       far
letter size                          
a      big   newz  0.978738  2.240893
             newa  1.764052  0.400157
       small newz  0.950088 -0.151357
             newa  1.867558 -0.977278
b      big   newz  0.144044  1.454274
             newa -0.103219  0.410599
       small newz  0.443863  0.333674
             newa  0.761038  0.121675

So before the apply command, df was properly sorted on the row index. However, as you can see, res is not properly sorted, even though its creation ends with a sort_index command.

This is a bug, right? I would think we want people to be able to assume that anytime they call sort_index that the result comes out lexicographically sorted, no?