BUG: groupby(..., dropna=False).indices with single group key does not include nan group (original) (raw)


Code Sample, a copy-pastable example

Your code here

In [9]: data = {'group':['g1', 'g1', 'g1', np.nan, 'g1', 'g1', 'g2', 'g2', 'g2', 'g2', np.nan], ...: 'A':[3, 1, 8, 2, 6, -1, 0, 13, -4, 0, 1], ...: 'B':[5, 2, 3, 7, 11, -1, 4,-1, 1, 0, 2]} ...: df = pd.DataFrame(data) ...: df.groupby('group',dropna=True).indices Out[9]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9])}

In [10]: df.groupby('group',dropna=False).indices Out[10]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9])}

In [11]: pd.version Out[11]: '1.2.0.dev0+67.gaefae55e1'

Problem description

The grouping codes + indices are determined for a single group by key here

values = Categorical(self.grouper)

And Categorical does not support nan as a label (only a missing -1 code)

This works correctly if multiple group keys are passed

Once this issue is addressed, #35542 will be fixed

Expected Output

In [10]: df.groupby('group',dropna=False).indices
Out[10]: {'g1': array([0, 1, 2, 4, 5]), 'g2': array([6, 7, 8, 9]), np.nan: array([3, 10]}