BUG: dropna affects observed in DataFrame.groupby() since v1.5 · Issue #48645 · pandas-dev/pandas (original) (raw)

With pandas 1.5

from pandas import DataFrame, Categorical

df = DataFrame({"x": Categorical([1, 2], categories=[1, 2, 3]), "y": [3, 4]})

df.groupby("x", observed=False).grouper.result_index

CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')

df.groupby("x", observed=False, dropna=False).grouper.result_index

CategoricalIndex([1, 2], categories=[1, 2, 3], ordered=False, dtype='category', name='x')

------------------------------------------------------------------------------------------

Unexpected result ↑

df.groupby("x", observed=False, dropna=True).grouper.result_index

CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')

With pandas 1.4.4 and prior

df.groupby("x", observed=False).grouper.result_index

CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')

df.groupby("x", observed=False, dropna=False).grouper.result_index

CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')

df.groupby("x", observed=False, dropna=True).grouper.result_index

CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')

dropna=False in DataFrame.groupby() should not affect the results when observed=False.

Expected the behavior with pandas 1.4.4 and prior.