BUG: dropna
affects observed
in DataFrame.groupby()
since v1.5 · Issue #48645 · pandas-dev/pandas (original) (raw)
With pandas 1.5
from pandas import DataFrame, Categorical
df = DataFrame({"x": Categorical([1, 2], categories=[1, 2, 3]), "y": [3, 4]})
df.groupby("x", observed=False).grouper.result_index
CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
df.groupby("x", observed=False, dropna=False).grouper.result_index
CategoricalIndex([1, 2], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
------------------------------------------------------------------------------------------
Unexpected result ↑
df.groupby("x", observed=False, dropna=True).grouper.result_index
CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
With pandas 1.4.4 and prior
df.groupby("x", observed=False).grouper.result_index
CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
df.groupby("x", observed=False, dropna=False).grouper.result_index
CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
df.groupby("x", observed=False, dropna=True).grouper.result_index
CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
dropna=False
in DataFrame.groupby()
should not affect the results when observed=False
.
Expected the behavior with pandas 1.4.4 and prior.