groupby on NaN-only column gives IndexError · Issue #11016 · pandas-dev/pandas (original) (raw)

Using pandas from master (0.16.2+590.g81b647f) in Python 3.4.2, the following code gives an IndexError: index out of bounds:

import pandas as pd, numpy as np
df = pd.DataFrame(dict(a=[np.nan]*3, b=[1,2,3]))
g = df.groupby(('a', 'b'))
len(g)  # IndexError

The same problem occurs when calling list(g) instead. Since NaN values are skipped according to the documentation, I guess the correct answer would be zero for len(g) and an empty list for list(g).

Strangely, iteration works, so for x in g: pass (or [x for x in g]) does not give an error (and iterates zero times). Also, g.count(), g.sum() etc. work (and return an empty DataFrame).

To add to the confusion, g.groups gives the dictionary {(nan, 1): [0], (nan, 2): [1], (nan, 3): [2]}. Shouldn’t this be empty because group keys with NaNs are dropped?

Grouping only by column 'a' or 'b' works and results in a length of 0 or 3, respectively.