API/WIP: .sorted by jreback · Pull Request #10726 · pandas-dev/pandas (original) (raw)
So It think that I understand now why @wesm has a .sortlevel
rather than just have this allowed in .sort_index
.
Imagine
In [13]: df = DataFrame(np.arange(12,0,-1).reshape(3,-1),index=pd.MultiIndex.from_tuples([(0,0),(0,1),(1,0)]))
In [14]: df
Out[14]:
0 1 2 3
0 0 12 11 10 9
1 8 7 6 5
1 0 4 3 2 1
In [15]: df.sortlevel(0)
Out[15]:
0 1 2 3
0 0 12 11 10 9
1 8 7 6 5
1 0 4 3 2 1
in the new .sort_index
, then this is ambiguousdf.sort_index(by=0)
,IS that the 0'th column or the 0th level of the index?
so we would now have to dodf.sort_index(by=None,level=0)
to be unambiguous (this adds the .level
kw)
If we were NOT to allow pass thru columns, then the signature would be almost exactly like .sortlevel
e.g.df.sort_index(level=...)
currently this will fail (in master) as you cannot sort a non-MultiIndex by level (though that is easily fixed).
so, I think to avoid ambiguity then we must either pick:
- 5a) leave
.sort_index
alone, and consequently.sortlevel
- 5b) add
level
kw to.sort_index
IN ADDITION to the currentby
kw, these are discreet sorters in thatby
would be for the columns andlevel
for the levels of the index, so avoids any ambiguity. The user is then in charge of saying, hey sort by these columns or these levels (in theory we could actually do a multi-sort). - 5c) deprecate
by
and addlevel
, avoids ambiguity but disallows the columns sorting pass thru - 5d) for labels, you can actually figure out whether its a level or a column (if their are duplicates then you multi-sort or raise), so we could use the
by
keyword only, BUT, the issue is that you can sort on a level number! So we could not allow level numbers in ambiguous cases. more complicated but maybe more natural.