additional keys in groupby indices when NAs are present · Issue #9304 · pandas-dev/pandas (original) (raw)
In [386]: h = pd.DataFrame({'a':[1,2,1,np.nan,1], 'b':[1,2,3,3,2], 'c':[2,3,1,4,2]})
In [387]: gh=h.groupby(['a', 'b'])
In [388]: gh.groups.keys()
Out[388]: [(1.0, 2), (nan, 3), (1.0, 3), (1.0, 1), (2.0, 2)]
In [389]: gh.indices.keys()
Out[389]: [(1.0, 2), (1.0, 3), (2.0, 3), (1.0, 1), (2.0, 2)] # Incorrect
The tuple (2.0, 3) should not be here.
The problem goes away when there are no NAs