BUG: sort_index/sortlevel fails MultiIndex after columns are added. · Issue #8017 · pandas-dev/pandas (original) (raw)

I have a DataFrame with a MultiIndex on the columns. The first level of the MultiIndex contains strings. The second, floats (though the problem persists if the second level is ints). I add a column to the DataFrame (which should not come last if the columns are sorted). I try to sort the DataFrame. The result does not seem to be sorted. The behavior is fine if the columns are simply an Index (even after adding columns). And the sort works fine in the MultiIndex case as long as no columns have been added since the DataFrame was created.

MWE:

import pandas as pd
import numpy as np

np.random.seed(0)
data = np.random.randn(3,4)

df_multi_float = pd.DataFrame(data, index=list('def'), columns=pd.MultiIndex.from_tuples([('red', i) for i in [1., 3., 2., 5.]]))

print df_multi_float

#OUTPUT
        red                              
          1         3         2         5
d  1.764052  0.400157  0.978738  2.240893
e  1.867558 -0.977278  0.950088 -0.151357
f -0.103219  0.410599  0.144044  1.454274

This sorts just fine as it isnow:

print df_multi_float.sort_index(axis=1)

#OUTPUT
        red                              
          1         2         3         5
d  1.764052  0.978738  0.400157  2.240893
e  1.867558  0.950088 -0.977278 -0.151357
f -0.103219  0.144044  0.410599  1.454274

But if I add columns to both this `DataFrame and then show it sorted, I get what looks to be a wrong result (the new column remains last, rather than being placed second-to-last as it should be):

df_multi_float[('red', 4.0)] = 'world'

print df_multi_float.sort_index(axis=1)

#OUTPUT
        red                                  red
          1         2         3         5      4
d  1.764052  0.978738  0.400157  2.240893  world
e  1.867558  0.950088 -0.977278 -0.151357  world
f -0.103219  0.144044  0.410599  1.454274  world

I'm able to produce this behavior on two systems. The first runs Pandas 0.14.0 and Numpy 1.8.1 and the second runs Pandas 0.14.1 and Numpy 1.8.2. This issue is described here: http://stackoverflow.com/questions/25287130/pandas-sort-index-fails-with-multiindex-containing-floats-as-one-level-when-col?noredirect=1#comment39408150_25287130