sort_index fails on MutiIndex'ed DataFrame resulting from groupby.apply · Issue #15687 · pandas-dev/pandas (original) (raw)
I feel like this is part of a suite of bugs that come from a failure to notice when a MultiIndex
that was once lexsort
ed loses its lexsort
edness. I submitted one example of this a couple years ago (see #8017), but this related bug persists.
On Pandas 0.19.0:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(
np.random.randn(8, 2),
index=pd.MultiIndex.from_product([['a', 'b'], ['big', 'small'], ['red', 'blue']], names=['letter', 'size', 'color']),
columns=['near', 'far']
)
df = df.sort_index()
def my_func(group):
group.index = ['newz', 'newa']
return group
res = df.groupby(level=['letter', 'size']).apply(my_func).sort_index()
print res
###OUTPUT###
near far
letter size
a big newz 0.978738 2.240893
newa 1.764052 0.400157
small newz 0.950088 -0.151357
newa 1.867558 -0.977278
b big newz 0.144044 1.454274
newa -0.103219 0.410599
small newz 0.443863 0.333674
newa 0.761038 0.121675
So before the apply
command, df
was properly sorted on the row index. However, as you can see, res
is not properly sorted, even though its creation ends with a sort_index
command.
This is a bug, right? I would think we want people to be able to assume that anytime they call sort_index
that the result comes out lexicographically sorted, no?