groupby on NaN-only column gives IndexError · Issue #11016 · pandas-dev/pandas (original) (raw)
Using pandas from master (0.16.2+590.g81b647f) in Python 3.4.2, the following code gives an IndexError: index out of bounds
:
import pandas as pd, numpy as np
df = pd.DataFrame(dict(a=[np.nan]*3, b=[1,2,3]))
g = df.groupby(('a', 'b'))
len(g) # IndexError
The same problem occurs when calling list(g)
instead. Since NaN values are skipped according to the documentation, I guess the correct answer would be zero for len(g)
and an empty list for list(g)
.
Strangely, iteration works, so for x in g: pass
(or [x for x in g]
) does not give an error (and iterates zero times). Also, g.count()
, g.sum()
etc. work (and return an empty DataFrame).
To add to the confusion, g.groups
gives the dictionary {(nan, 1): [0], (nan, 2): [1], (nan, 3): [2]}
. Shouldn’t this be empty because group keys with NaNs are dropped?
Grouping only by column 'a' or 'b' works and results in a length of 0 or 3, respectively.