pandas (original) (raw)

So It think that I understand now why @wesm has a .sortlevel rather than just have this allowed in .sort_index.

Imagine

In [13]: df = DataFrame(np.arange(12,0,-1).reshape(3,-1),index=pd.MultiIndex.from_tuples([(0,0),(0,1),(1,0)]))

In [14]: df
Out[14]: 
      0   1   2  3
0 0  12  11  10  9
  1   8   7   6  5
1 0   4   3   2  1

In [15]: df.sortlevel(0)
Out[15]: 
      0   1   2  3
0 0  12  11  10  9
  1   8   7   6  5
1 0   4   3   2  1

in the new .sort_index, then this is ambiguous
df.sort_index(by=0) ,IS that the 0'th column or the 0th level of the index?

so we would now have to do
df.sort_index(by=None,level=0) to be unambiguous (this adds the .level kw)

If we were NOT to allow pass thru columns, then the signature would be almost exactly like .sortlevel
e.g.
df.sort_index(level=...)

currently this will fail (in master) as you cannot sort a non-MultiIndex by level (though that is easily fixed).

so, I think to avoid ambiguity then we must either pick:

5a) leave .sort_index alone, and consequently .sortlevel
5b) add level kw to .sort_index IN ADDITION to the current by kw, these are discreet sorters in that by would be for the columns and level for the levels of the index, so avoids any ambiguity. The user is then in charge of saying, hey sort by these columns or these levels (in theory we could actually do a multi-sort).
5c) deprecate by and add level, avoids ambiguity but disallows the columns sorting pass thru
5d) for labels, you can actually figure out whether its a level or a column (if their are duplicates then you multi-sort or raise), so we could use the by keyword only, BUT, the issue is that you can sort on a level number! So we could not allow level numbers in ambiguous cases. more complicated but maybe more natural.