ENH: enumerate groups · Issue #11642 · pandas-dev/pandas (original) (raw)
Sometimes it's handy to have access to a distinct integer for each group. For example, using the (internal) grouper:
>>> df = pd.DataFrame({"a": list("xyyzxy"), "b": list("ab"*3), "c": range(6)})
>>> df["group_id"] = df.groupby(["a","b"]).grouper.group_info[0]
>>> df
a b c group_id
0 x a 0 0
1 y b 1 2
2 y a 2 1
3 z b 3 3
4 x a 4 0
5 y b 5 2
This can be achieved in a number of ways but none of them are particularly elegant, esp. if we're grouping on multiple keys and/or Series. Accordingly, after a brief discussion on gitter, I propose a new method transform("enumerate")
which returns a Series of integers from 0 to ngroups-1 matching the order the groups will be iterated in. In other words, we'll simply be applying the following map:
>>> m = {k: i for i, (k,g) in enumerate(df.groupby(["a","b"]))}
>>> m
{('x', 'a'): 0, ('y', 'b'): 2, ('y', 'a'): 1, ('z', 'b'): 3}
(Note this is only to shows the desired behaviour, and wouldn't be how it'd be implemented!)