sort_index fails on MutiIndex'ed DataFrame resulting from groupby.apply (original) (raw)
I feel like this is part of a suite of bugs that come from a failure to notice when a MultiIndex that was once lexsorted loses its lexsortedness. I submitted one example of this a couple years ago (see #8017), but this related bug persists.
On Pandas 0.19.0:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(
np.random.randn(8, 2),
index=pd.MultiIndex.from_product([['a', 'b'], ['big', 'small'], ['red', 'blue']], names=['letter', 'size', 'color']),
columns=['near', 'far']
)
df = df.sort_index()
def my_func(group):
group.index = ['newz', 'newa']
return group
res = df.groupby(level=['letter', 'size']).apply(my_func).sort_index()
print res
###OUTPUT###
near far
letter size
a big newz 0.978738 2.240893
newa 1.764052 0.400157
small newz 0.950088 -0.151357
newa 1.867558 -0.977278
b big newz 0.144044 1.454274
newa -0.103219 0.410599
small newz 0.443863 0.333674
newa 0.761038 0.121675
So before the apply command, df was properly sorted on the row index. However, as you can see, res is not properly sorted, even though its creation ends with a sort_index command.
This is a bug, right? I would think we want people to be able to assume that anytime they call sort_index that the result comes out lexicographically sorted, no?