ENH: per axis and per level indexing (orig GH6134) by jreback · Pull Request #6301 · pandas-dev/pandas (original) (raw)

This is a reprise of #6134, with tests, and multi-axis support; it is dependent on #6299

closes #4036
closes #4116
closes #3057
closes #2598
closes #5641
closes #3738

This is the whatsnew/docs

MultiIndexing Using Slicers

In 0.14.0 we added a new way to slice multi-indexed objects. You can slice a multi-index by providing multiple indexers. You can use slice(None) to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as slice(None). See the docs

Warning 

You should specify all axes in the .loc specifier, meaning the indexer for the index and for the columns. Their are some ambiguous cases where the passed indexer could be mis-interpreted as indexing both axes, rather than into say the MuliIndex for the rows.

You should do this:

df.loc[(slice('A1','A3'),.....,:]
rather than this:

df.loc[(slice('A1','A3'),.....]
Warning

You will need to make sure that the selection axes are fully lexsorted!
In [7]: def mklbl(prefix,n):
   ...:     return ["%s%s" % (prefix,i)  for i in range(n)]
   ...: 

In [8]: index = MultiIndex.from_product([mklbl('A',4),
   ...:                                  mklbl('B',2),
   ...:                                  mklbl('C',4),
   ...:                                  mklbl('D',2)])
   ...: 

In [9]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
   ...:                                   ('b','foo'),('b','bah')],
   ...:                                    names=['lvl0', 'lvl1'])
   ...: 

In [10]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
   ....:                index=index,
   ....:                columns=columns).sortlevel().sortlevel(axis=1)
   ....: 

In [11]: df
Out[11]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A0 B0 C0 D0    1    0    3    2
         D1    5    4    7    6
      C1 D0    9    8   11   10
         D1   13   12   15   14
      C2 D0   17   16   19   18
         D1   21   20   23   22
      C3 D0   25   24   27   26
         D1   29   28   31   30
   B1 C0 D0   33   32   35   34
         D1   37   36   39   38
      C1 D0   41   40   43   42
         D1   45   44   47   46
      C2 D0   49   48   51   50
         D1   53   52   55   54
      C3 D0   57   56   59   58
             ...  ...  ...  ...

[64 rows x 4 columns]
In [12]: df.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
Out[12]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A1 B0 C1 D0   73   72   75   74
         D1   77   76   79   78
      C3 D0   89   88   91   90
         D1   93   92   95   94
   B1 C1 D0  105  104  107  106
         D1  109  108  111  110
      C3 D0  121  120  123  122
         D1  125  124  127  126
A2 B0 C1 D0  137  136  139  138
         D1  141  140  143  142
      C3 D0  153  152  155  154
         D1  157  156  159  158
   B1 C1 D0  169  168  171  170
         D1  173  172  175  174
      C3 D0  185  184  187  186
             ...  ...  ...  ...

[16 rows x 4 columns]

In [13]: df.loc[(slice(None),slice(None), ['C1','C3']),:]
Out[13]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A0 B0 C1 D0    9    8   11   10
         D1   13   12   15   14
      C3 D0   25   24   27   26
         D1   29   28   31   30
   B1 C1 D0   41   40   43   42
         D1   45   44   47   46
      C3 D0   57   56   59   58
         D1   61   60   63   62
A1 B0 C1 D0   73   72   75   74
         D1   77   76   79   78
      C3 D0   89   88   91   90
         D1   93   92   95   94
   B1 C1 D0  105  104  107  106
         D1  109  108  111  110
      C3 D0  121  120  123  122
             ...  ...  ...  ...

[32 rows x 4 columns]

It is possible to perform quite complicated selections using this method on multiple axes at the same time.

In [14]: df.loc['A1',(slice(None),'foo')]
Out[14]: 
lvl0        a    b
lvl1      foo  foo
B0 C0 D0   64   66
      D1   68   70
   C1 D0   72   74
      D1   76   78
   C2 D0   80   82
      D1   84   86
   C3 D0   88   90
      D1   92   94
B1 C0 D0   96   98
      D1  100  102
   C1 D0  104  106
      D1  108  110
   C2 D0  112  114
      D1  116  118
   C3 D0  120  122
          ...  ...

[16 rows x 2 columns]

In [15]: df.loc[(slice(None),slice(None), ['C1','C3']),(slice(None),'foo')]
Out[15]: 
lvl0           a    b
lvl1         foo  foo
A0 B0 C1 D0    8   10
         D1   12   14
      C3 D0   24   26
         D1   28   30
   B1 C1 D0   40   42
         D1   44   46
      C3 D0   56   58
         D1   60   62
A1 B0 C1 D0   72   74
         D1   76   78
      C3 D0   88   90
         D1   92   94
   B1 C1 D0  104  106
         D1  108  110
      C3 D0  120  122
             ...  ...

[32 rows x 2 columns]

Furthermore you can set the values using these methods

In [16]: df2 = df.copy()

In [17]: df2.loc[(slice(None),slice(None), ['C1','C3']),:] = -10

In [18]: df2
Out[18]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A0 B0 C0 D0    1    0    3    2
         D1    5    4    7    6
      C1 D0  -10  -10  -10  -10
         D1  -10  -10  -10  -10
      C2 D0   17   16   19   18
         D1   21   20   23   22
      C3 D0  -10  -10  -10  -10
         D1  -10  -10  -10  -10
   B1 C0 D0   33   32   35   34
         D1   37   36   39   38
      C1 D0  -10  -10  -10  -10
         D1  -10  -10  -10  -10
      C2 D0   49   48   51   50
         D1   53   52   55   54
      C3 D0  -10  -10  -10  -10
             ...  ...  ...  ...

[64 rows x 4 columns]

You use a right-hand-side of an alignable object as well.

In [19]: df2 = df.copy()

In [20]: df2.loc[(slice(None),slice(None), ['C1','C3']),:] = df2*1000

In [21]: df2
Out[21]: 
lvl0             a             b       
lvl1           bar    foo    bah    foo
A0 B0 C0 D0      1      0      3      2
         D1      5      4      7      6
      C1 D0   1000      0   3000   2000
         D1   5000   4000   7000   6000
      C2 D0     17     16     19     18
         D1     21     20     23     22
      C3 D0   9000   8000  11000  10000
         D1  13000  12000  15000  14000
   B1 C0 D0     33     32     35     34
         D1     37     36     39     38
      C1 D0  17000  16000  19000  18000
         D1  21000  20000  23000  22000
      C2 D0     49     48     51     50
         D1     53     52     55     54
      C3 D0  25000  24000  27000  26000
               ...    ...    ...    ...

[64 rows x 4 columns]