BUG: different behaviors of sort_index() and sort_index(level=0) · Issue #13431 · pandas-dev/pandas (original) (raw)
Inspired by some bug reports around multiindex sortedness (http://stackoverflow.com/questions/31427466/ensuring-lexicographical-sort-in-pandas-multiindex, #10651, #9212), I found that sort_index()
sometimes can't make a multiindex ready for slicing, but sort_index(level=0)
(so does sortlevel()
) can.
In [119]: pd.version Out[119]: u'0.18.1'
In [120]: df = pd.DataFrame({'col1': ['b','d','b','a'], 'col2': [3,1,1,2], 'data':['one','two','three','four']})
In [121]: df2 = df.set_index(['col1','col2'])
In [122]: df2.index.set_levels(['b','d','a'], level='col1', inplace=True)
In [123]: df2.index.set_labels([0,1,0,2], level='col1', inplace=True)
In [124]: df2.sortlevel() Out[124]: data col1 col2 b 1 three 3 one d 1 two a 2 four
In [125]: df2.sort_index() Out[125]: data col1 col2 a 2 four b 1 three 3 one d 1 two
In [126]: df2.sort_index(level=0) Out[126]: data col1 col2 b 1 three 3 one d 1 two a 2 four
While df2.sort_index()
does give a visually lexicographically sorted output, it DOES NOT support slicing.
df2.sort_index().loc['b':'d']
KeyError Traceback (most recent call last)
a lot of lines omitted here.
/Users/yimengzh/miniconda/envs/cafferc3/lib/python2.7/site-packages/pandas/indexes/multi.py in _partial_tup_index(self, tup, side) 1488 raise KeyError('Key length (%d) was greater than MultiIndex' 1489 ' lexsort depth (%d)' % -> 1490 (len(tup), self.lexsort_depth)) 1491 1492 n = len(tup)
KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'
So I have two questions.
- Is this the intended behavior? I thought
level=0
andlevel=None
are synonyms to me, but they are not. Looking at the code, https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/frame.py#L3245-L3247 indeed there's a special processing whenlevel
is notNone
. - What does "lexicographically sorted" mean? I think it should mean sorted in terms of levels, not labels. Is this what "lexicographically sorted" means in the doc for Advanced Indexing? If this is true, then I think
make_index(level=0)
is correct, yetmake_index()
is not.
Thanks.
output of pd.show_versions()
In [130]: pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.23.4 numpy: 1.11.0 scipy: 0.17.0 statsmodels: None xarray: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None