API/WIP: .sorted by jreback · Pull Request #10726 · pandas-dev/pandas (original) (raw)
So It think that I understand now why @wesm has a .sortlevel rather than just have this allowed in .sort_index.
Imagine
In [13]: df = DataFrame(np.arange(12,0,-1).reshape(3,-1),index=pd.MultiIndex.from_tuples([(0,0),(0,1),(1,0)]))
In [14]: df
Out[14]:
0 1 2 3
0 0 12 11 10 9
1 8 7 6 5
1 0 4 3 2 1
In [15]: df.sortlevel(0)
Out[15]:
0 1 2 3
0 0 12 11 10 9
1 8 7 6 5
1 0 4 3 2 1
in the new .sort_index, then this is ambiguousdf.sort_index(by=0) ,IS that the 0'th column or the 0th level of the index?
so we would now have to dodf.sort_index(by=None,level=0) to be unambiguous (this adds the .level kw)
If we were NOT to allow pass thru columns, then the signature would be almost exactly like .sortlevel
e.g.df.sort_index(level=...)
currently this will fail (in master) as you cannot sort a non-MultiIndex by level (though that is easily fixed).
so, I think to avoid ambiguity then we must either pick:
- 5a) leave
.sort_indexalone, and consequently.sortlevel - 5b) add
levelkw to.sort_indexIN ADDITION to the currentbykw, these are discreet sorters in thatbywould be for the columns andlevelfor the levels of the index, so avoids any ambiguity. The user is then in charge of saying, hey sort by these columns or these levels (in theory we could actually do a multi-sort). - 5c) deprecate
byand addlevel, avoids ambiguity but disallows the columns sorting pass thru - 5d) for labels, you can actually figure out whether its a level or a column (if their are duplicates then you multi-sort or raise), so we could use the
bykeyword only, BUT, the issue is that you can sort on a level number! So we could not allow level numbers in ambiguous cases. more complicated but maybe more natural.