BUG: sort_index/sortlevel fails MultiIndex after columns are added. · Issue #8017 · pandas-dev/pandas (original) (raw)
I have a DataFrame
with a MultiIndex
on the columns. The first level of the MultiIndex contains str
ings. The second, float
s (though the problem persists if the second level is int
s). I add a column to the DataFrame
(which should not come last if the columns are sorted). I try to sort the DataFrame
. The result does not seem to be sorted. The behavior is fine if the columns are simply an Index
(even after adding columns). And the sort works fine in the MultiIndex
case as long as no columns have been added since the DataFrame
was created.
MWE:
import pandas as pd
import numpy as np
np.random.seed(0)
data = np.random.randn(3,4)
df_multi_float = pd.DataFrame(data, index=list('def'), columns=pd.MultiIndex.from_tuples([('red', i) for i in [1., 3., 2., 5.]]))
print df_multi_float
#OUTPUT
red
1 3 2 5
d 1.764052 0.400157 0.978738 2.240893
e 1.867558 -0.977278 0.950088 -0.151357
f -0.103219 0.410599 0.144044 1.454274
This sorts just fine as it isnow:
print df_multi_float.sort_index(axis=1)
#OUTPUT
red
1 2 3 5
d 1.764052 0.978738 0.400157 2.240893
e 1.867558 0.950088 -0.977278 -0.151357
f -0.103219 0.144044 0.410599 1.454274
But if I add columns to both this `DataFrame and then show it sorted, I get what looks to be a wrong result (the new column remains last, rather than being placed second-to-last as it should be):
df_multi_float[('red', 4.0)] = 'world'
print df_multi_float.sort_index(axis=1)
#OUTPUT
red red
1 2 3 5 4
d 1.764052 0.978738 0.400157 2.240893 world
e 1.867558 0.950088 -0.977278 -0.151357 world
f -0.103219 0.144044 0.410599 1.454274 world
I'm able to produce this behavior on two systems. The first runs Pandas 0.14.0 and Numpy 1.8.1 and the second runs Pandas 0.14.1 and Numpy 1.8.2. This issue is described here: http://stackoverflow.com/questions/25287130/pandas-sort-index-fails-with-multiindex-containing-floats-as-one-level-when-col?noredirect=1#comment39408150_25287130