API: DataFrameGroupBy column subset selection with single list? · Issue #23566 · pandas-dev/pandas (original) (raw)

I wouldn't be surprised if there is already an issue about this, but couldn't directly find one.

When doing a subselection of columns on a DataFrameGroupBy object, both a plain list (so a tuple within the __getitem__ [] brackets) as the double square brackets (a list inside the __getitem__ [] brackets) seems to work:

In [6]: df = pd.DataFrame(np.random.randint(10, size=(10, 4)), columns=['a', 'b', 'c', 'd'])

In [8]: df.groupby('a').sum()
Out[8]: 
    b   c   d
a            
0   0   5   7
3  18   6  12
4  16   6   9
6  10  11  11
9   3   3   0

In [9]: df.groupby('a')['b', 'c'].sum()
Out[9]: 
    b   c
a        
0   0   5
3  18   6
4  16   6
6  10  11
9   3   3

In [10]: df.groupby('a')[['b', 'c']].sum()
Out[10]: 
    b   c
a        
0   0   5
3  18   6
4  16   6
6  10  11
9   3   3

Personally I find this df.groupby('a')['b', 'c'].sum() a bit strange, and inconsistent with how DataFrame indexing works.

Of course, on a DataFrameGroupBy you don't have the possible confusion with indexing multiple dimensions (rows, columns), but still.

cc @jreback @WillAyd