ENH: implement drop_levels argument in loc · Issue #35418 · pandas-dev/pandas (original) (raw)

xref #6249, #18262, #35301

Following up on #35301. This is an outstanding piece of DataFrame.xs functionality that's currently not available in .loc. Since xs is slated for deprecation I'd like implement this in .loc

Example

Let's define a multiindexed dataframe (example from xs docs)

   ...:  import pandas as pd
   ...:  
   ...: d = {  
   ...:     'num_legs': [4, 4, 2, 2],  
   ...:     'num_wings': [0, 0, 2, 2],  
   ...:     'class': ['mammal', 'mammal', 'mammal', 'bird'],  
   ...:     'animal': ['cat', 'dog', 'bat', 'penguin'],  
   ...:     'locomotion': ['walks', 'walks', 'flies', 'walks'] 
   ...:     }  
   ...:  
   ...: df = pd.DataFrame(data=d) 
   ...: df.set_index(['class', 'animal', 'locomotion'], inplace=True)                                                 

In [2]: df                                                                                                            
Out[2]: 
                           num_legs  num_wings
class  animal  locomotion                     
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2

xs and .loc both let us take a cross-section through a MultiIndex

In [3]: import pandas._testing as tm 
    ...: 
    ...: res_xs = df.xs(('mammal', slice(None))) 
    ...: res_loc  = df.loc[('mammal', slice(None))] 
    ...: tm.assert_frame_equal(res_xs, res_loc)                                                                       

but with xs we can choose to keep the index columns through which we're slicing using the drop_level argument:

In [4]: df.xs(('mammal', slice(None), 'flies'), drop_level=False)                                                    
Out[4]: 
                          num_legs  num_wings
class  animal locomotion                     
mammal bat    flies              2          2

new API

Looking for ideas on what's appropriate here. Based off of suggestions in #6249 would something like this work?

In [ ]: df.loc(drop_levels=False)[('mammal', slice(None), 'flies')]      
Out[4]: 
                          num_legs  num_wings
class  animal locomotion                     
mammal bat    flies              2          2