ENH: consistent types in output of df.groupby · Issue #42795 · pandas-dev/pandas (original) (raw)

The problem that occurred to me is due to inconsistent types outputted by for values, match_df in df.groupby(names) - if names is a list of length >1, then values can be a tuple. Otherwise it's an element type, which can be e.g. string.

I had a bug where I iterating over values resulted in going over characters of a string, together with zip it just discarded most of the field.

Describe the solution you'd like

Whenever df.groupby is called with a list, output a tuple of values found. Specifically when called with a list of length 1, output a tuple of length 1 instead of unwrapped element.

API breaking implications

This would change how existing code behaves.

Describe alternatives you've considered

Leave things as they are. Unfortunately it means people need to write special case handling ifs around groupby to get consistent types outputted.

Additional context

import pandas as pd

df = pd.DataFrame(columns=['a','b','c'], index=['x','y']) df.loc['y'] = pd.Series({'a':1, 'b':5, 'c':2}) print(df)

values, _ = next(iter(df.groupby(['a', 'b']))) print(type(values)) # <class 'tuple'>

values, _ = next(iter(df.groupby(['a']))) print(type(values)) # <class 'int'> # IMHO should be 'tuple'!

values, _ = next(iter(df.groupby('a'))) print(type(values)) # <class 'int'>