BUG: groupby with categorical index doesn't include unobserved categories · Issue #49354 · pandas-dev/pandas (original) (raw)
When a grouping by a CategoricalIndex, unobserved categories are not included in the output.
df = pd.DataFrame(
{
"a": pd.Categorical([1, 1, 2], categories=[1, 2, 3]),
"a2": pd.Categorical([1, 1, 2], categories=[1, 2, 3]),
"b": [4, 5, 6],
"c": [7, 8, 9],
}
).set_index(["a"])
gb = df.groupby("a", observed=False)
result = gb.sum()
print(result)
# Should include a row with 3, the unobserved category
# b c
# a
# 1 9 15
# 2 6 9
df = df.reset_index().set_index(["a", "a2"])
gb = df.groupby(["a", "a2"], observed=False)
result = gb.sum()
print(result)
# Should include two rows with 3, the unobserved category
# b c
# a a2
# 1 1 9 15
# 2 0 0
# 2 1 0 0
# 2 6 9