REGR: GroupBy.indices no longer includes unobserved categories · Issue #38642 · pandas-dev/pandas (original) (raw)
Does anybody know if this was an intentional change? (I don't directly find something about it in the whatsnew)
In [9]: pd.__version__
Out[9]: '1.0.5'
In [10]: df = pd.DataFrame({"key": pd.Categorical(["b"]*5, categories=["a", "b", "c", "d"]), "col": range(5)})
In [11]: gb = df.groupby("key")
In [12]: list(gb.indices)
Out[12]: ['a', 'b', 'c', 'd']
vs
In [1]: pd.__version__
Out[1]: '1.3.0.dev0+92.ga2d10ba88a'
In [2]: df = pd.DataFrame({"key": pd.Categorical(["b"]*5, categories=["a", "b", "c", "d"]), "col": range(5)})
In [3]: gb = df.groupby("key")
In [4]: list(gb.indices)
Out[4]: ['b']
This already changed in pandas 1.1, so not a recent change.
The consequence of this is that iterating over gb
vs iterating over gb.indices
is not consistent anymore.