...">

ENH: enumerate groups · Issue #11642 · pandas-dev/pandas (original) (raw)

Sometimes it's handy to have access to a distinct integer for each group. For example, using the (internal) grouper:

>>> df = pd.DataFrame({"a": list("xyyzxy"), "b": list("ab"*3), "c": range(6)})
>>> df["group_id"] = df.groupby(["a","b"]).grouper.group_info[0]
>>> df
   a  b  c  group_id
0  x  a  0         0
1  y  b  1         2
2  y  a  2         1
3  z  b  3         3
4  x  a  4         0
5  y  b  5         2

This can be achieved in a number of ways but none of them are particularly elegant, esp. if we're grouping on multiple keys and/or Series. Accordingly, after a brief discussion on gitter, I propose a new method transform("enumerate") which returns a Series of integers from 0 to ngroups-1 matching the order the groups will be iterated in. In other words, we'll simply be applying the following map:

>>> m = {k: i for i, (k,g) in enumerate(df.groupby(["a","b"]))}
>>> m
{('x', 'a'): 0, ('y', 'b'): 2, ('y', 'a'): 1, ('z', 'b'): 3}

(Note this is only to shows the desired behaviour, and wouldn't be how it'd be implemented!)