Categorical in dataframe is sorted lexically. · Issue #7848 · pandas-dev/pandas (original) (raw)

code

df = pd.DataFrame({"id":[6,5,4,3,2,1], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']}) df["grade"] = pd.Categorical(df["raw_grade"]) df['grade'].cat.reorder_levels(['b', 'e', 'a'])

sorts 'grade' according to the order of the levels

df.sort(columns=['grade'])

correct output

  id   raw_grade  grade
4 5    b          b
3 4    b          b
0 1    e          e
2 6    a          a
1 3    a          a
5 2    a          a

code

sorts 'grade' lexically

df.sort(columns=['grade', 'id'])

wrong output

  id   raw_grade  grade
4 2    a          a
3 3    a          a
0 6    a          a
2 4    b          b
1 5    b          b
5 1    e          e

When there is more than one element in the columns list, the Categoricals columns are sorted lexically.

pandas: 0.14.1-78-g24b309f