ENH: per axis and per level indexing (orig GH6134) by jreback · Pull Request #6301 · pandas-dev/pandas (original) (raw)
This is a reprise of #6134, with tests, and multi-axis support; it is dependent on #6299
closes #4036
closes #4116
closes #3057
closes #2598
closes #5641
closes #3738
- docs
- v0.14.0 example
- release notes
- setting assignment to slice #5641, Assignment with MultiIndex replaces dataframe contents with NaNs #3738
This is the whatsnew/docs
MultiIndexing Using Slicers
In 0.14.0 we added a new way to slice multi-indexed objects. You can slice a multi-index by providing multiple indexers. You can use slice(None) to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as slice(None). See the docs
Warning
You should specify all axes in the .loc specifier, meaning the indexer for the index and for the columns. Their are some ambiguous cases where the passed indexer could be mis-interpreted as indexing both axes, rather than into say the MuliIndex for the rows.
You should do this:
df.loc[(slice('A1','A3'),.....,:]
rather than this:
df.loc[(slice('A1','A3'),.....]
Warning
You will need to make sure that the selection axes are fully lexsorted!
In [7]: def mklbl(prefix,n):
...: return ["%s%s" % (prefix,i) for i in range(n)]
...:
In [8]: index = MultiIndex.from_product([mklbl('A',4),
...: mklbl('B',2),
...: mklbl('C',4),
...: mklbl('D',2)])
...:
In [9]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
...: ('b','foo'),('b','bah')],
...: names=['lvl0', 'lvl1'])
...:
In [10]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
....: index=index,
....: columns=columns).sortlevel().sortlevel(axis=1)
....:
In [11]: df
Out[11]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9 8 11 10
D1 13 12 15 14
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 25 24 27 26
D1 29 28 31 30
B1 C0 D0 33 32 35 34
D1 37 36 39 38
C1 D0 41 40 43 42
D1 45 44 47 46
C2 D0 49 48 51 50
D1 53 52 55 54
C3 D0 57 56 59 58
... ... ... ...
[64 rows x 4 columns]
In [12]: df.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
Out[12]:
lvl0 a b
lvl1 bar foo bah foo
A1 B0 C1 D0 73 72 75 74
D1 77 76 79 78
C3 D0 89 88 91 90
D1 93 92 95 94
B1 C1 D0 105 104 107 106
D1 109 108 111 110
C3 D0 121 120 123 122
D1 125 124 127 126
A2 B0 C1 D0 137 136 139 138
D1 141 140 143 142
C3 D0 153 152 155 154
D1 157 156 159 158
B1 C1 D0 169 168 171 170
D1 173 172 175 174
C3 D0 185 184 187 186
... ... ... ...
[16 rows x 4 columns]
In [13]: df.loc[(slice(None),slice(None), ['C1','C3']),:]
Out[13]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C1 D0 9 8 11 10
D1 13 12 15 14
C3 D0 25 24 27 26
D1 29 28 31 30
B1 C1 D0 41 40 43 42
D1 45 44 47 46
C3 D0 57 56 59 58
D1 61 60 63 62
A1 B0 C1 D0 73 72 75 74
D1 77 76 79 78
C3 D0 89 88 91 90
D1 93 92 95 94
B1 C1 D0 105 104 107 106
D1 109 108 111 110
C3 D0 121 120 123 122
... ... ... ...
[32 rows x 4 columns]
It is possible to perform quite complicated selections using this method on multiple axes at the same time.
In [14]: df.loc['A1',(slice(None),'foo')]
Out[14]:
lvl0 a b
lvl1 foo foo
B0 C0 D0 64 66
D1 68 70
C1 D0 72 74
D1 76 78
C2 D0 80 82
D1 84 86
C3 D0 88 90
D1 92 94
B1 C0 D0 96 98
D1 100 102
C1 D0 104 106
D1 108 110
C2 D0 112 114
D1 116 118
C3 D0 120 122
... ...
[16 rows x 2 columns]
In [15]: df.loc[(slice(None),slice(None), ['C1','C3']),(slice(None),'foo')]
Out[15]:
lvl0 a b
lvl1 foo foo
A0 B0 C1 D0 8 10
D1 12 14
C3 D0 24 26
D1 28 30
B1 C1 D0 40 42
D1 44 46
C3 D0 56 58
D1 60 62
A1 B0 C1 D0 72 74
D1 76 78
C3 D0 88 90
D1 92 94
B1 C1 D0 104 106
D1 108 110
C3 D0 120 122
... ...
[32 rows x 2 columns]
Furthermore you can set the values using these methods
In [16]: df2 = df.copy()
In [17]: df2.loc[(slice(None),slice(None), ['C1','C3']),:] = -10
In [18]: df2
Out[18]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 -10 -10 -10 -10
D1 -10 -10 -10 -10
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 -10 -10 -10 -10
D1 -10 -10 -10 -10
B1 C0 D0 33 32 35 34
D1 37 36 39 38
C1 D0 -10 -10 -10 -10
D1 -10 -10 -10 -10
C2 D0 49 48 51 50
D1 53 52 55 54
C3 D0 -10 -10 -10 -10
... ... ... ...
[64 rows x 4 columns]
You use a right-hand-side of an alignable object as well.
In [19]: df2 = df.copy()
In [20]: df2.loc[(slice(None),slice(None), ['C1','C3']),:] = df2*1000
In [21]: df2
Out[21]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 1000 0 3000 2000
D1 5000 4000 7000 6000
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 9000 8000 11000 10000
D1 13000 12000 15000 14000
B1 C0 D0 33 32 35 34
D1 37 36 39 38
C1 D0 17000 16000 19000 18000
D1 21000 20000 23000 22000
C2 D0 49 48 51 50
D1 53 52 55 54
C3 D0 25000 24000 27000 26000
... ... ... ...
[64 rows x 4 columns]