BUG: groupby with categorical index doesn't include unobserved categories · Issue #49354 · pandas-dev/pandas (original) (raw)

When a grouping by a CategoricalIndex, unobserved categories are not included in the output.

df = pd.DataFrame(
    {
        "a": pd.Categorical([1, 1, 2], categories=[1, 2, 3]),
        "a2": pd.Categorical([1, 1, 2], categories=[1, 2, 3]),
        "b": [4, 5, 6],
        "c": [7, 8, 9],
    }
).set_index(["a"])
gb = df.groupby("a", observed=False)
result = gb.sum()
print(result)
# Should include a row with 3, the unobserved category
#    b   c
# a       
# 1  9  15
# 2  6   9

df = df.reset_index().set_index(["a", "a2"])
gb = df.groupby(["a", "a2"], observed=False)
result = gb.sum()
print(result)
# Should include two rows with 3, the unobserved category
#       b   c
# a a2       
# 1 1   9  15
#   2   0   0
# 2 1   0   0
#   2   6   9