Partial Selection on MultiIndex: Control what index depth is returned. (original) (raw)

Partial selection using .xs() & .ix[] on a subset of index levels returns a df with the fixed/selected levels dropped (a very nice feature). However, when you partially select using a tuple on all levels you get a data frame with all indices (levels) returned. It would be nice to have an option to return any/all indices when sub-selecting using subset of levels so there is consistency when you reach the lowest index level.

Perhaps there could be a "drop_fixed_index" parameter option when sub-selecting.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: print pd.version 0.11.0.dev-80945b6

In [4]: # Generate Test DataFrame ...: NUM_ROWS = 100000 ...:

In [5]: NUM_COLS = 10

In [6]: col_names = ['A'+num for num in map(str,np.arange(NUM_COLS).tolist())]

In [7]: index_cols = col_names[:5]

In [8]: # Set DataFrame to have 5 level Hierarchical Index & Sort Index! ...: # The dtype does not matter try str or np.int64 same results. ...: df = pd.DataFrame(np.random.randint(5, size=(NUM_ROWS,NUM_COLS)), dtype=np.int64, columns=col_names) ...:

In [9]: df = df.set_index(index_cols).sort_index()

...

In [79]: df Out[79]: <class 'pandas.core.frame.DataFrame'> MultiIndex: 100000 entries, (0, 0, 0, 0, 0) to (4, 4, 4, 4, 4) Data columns: A5 100000 non-null values A6 100000 non-null values A7 100000 non-null values A8 100000 non-null values A9 100000 non-null values dtypes: int64(5)

In [80]: df.ix[(0)] Out[80]: <class 'pandas.core.frame.DataFrame'> MultiIndex: 20011 entries, (0, 0, 0, 0) to (4, 4, 4, 4) Data columns: A5 20011 non-null values A6 20011 non-null values A7 20011 non-null values A8 20011 non-null values A9 20011 non-null values dtypes: int64(5)

In [81]: df.ix[(0,1)] Out[81]: <class 'pandas.core.frame.DataFrame'> MultiIndex: 4007 entries, (0, 0, 0) to (4, 4, 4) Data columns: A5 4007 non-null values A6 4007 non-null values A7 4007 non-null values A8 4007 non-null values A9 4007 non-null values dtypes: int64(5)

In [82]: df.ix[(0,1,2)] Out[82]: <class 'pandas.core.frame.DataFrame'> MultiIndex: 817 entries, (0, 0) to (4, 4) Data columns: A5 817 non-null values A6 817 non-null values A7 817 non-null values A8 817 non-null values A9 817 non-null values dtypes: int64(5)

In [83]: df.ix[(0,1,2,3)] Out[83]: <class 'pandas.core.frame.DataFrame'> Int64Index: 162 entries, 0 to 4 Data columns: A5 162 non-null values A6 162 non-null values A7 162 non-null values A8 162 non-null values A9 162 non-null values dtypes: int64(5)

In [84]: df.ix[(0,1,2,3,4)] Out[84]: A5 A6 A7 A8 A9 A0 A1 A2 A3 A4
0 1 2 3 4 1 2 2 4 2 4 1 4 4 1 0 4 2 1 4 1 3 4 2 4 2 1 1 4 1 1 2 1 4 4 0 0 2 1 1 4 2 0 0 3 1 4 2 2 3 3 1 4 3 0 3 4 1 4 1 1 0 0 1 4 2 1 0 2 4 4 3 4 1 2 3 4 0 4 3 1 0 4 4 1 4 1 2 4 1 3 4 3 3 4 0 1 1 3 1 4 2 2 2 0 3 4 0 0 1 4 0 4 1 0 1 4 2 4 1 4 2 2 0 4 4 2 0 3 1 4 2 1 2 3 2 4 4 2 0 1 4 4 1 4 1 1 4 4 1 0 1 2 4 4 2 3 0 1 3 4 2 1 3 3 3 4 1 2 0 4 2 4 3 0 4 4 0 4 4 4 2 3 0 4 0 0 1 3 2 4 4 0 0 0 3 4 2 0 3 4 2 4 3 3 3 0 2 4 4 2 2 0 1 4 2 1 3 4 0

In [86]: df.index.lexsort_depth Out[86]: 5