Groupby "negative dimensions are not allowed" error and bad key behaviour when there are NaNs values. · Issue #9096 · pandas-dev/pandas (original) (raw)
On a groupby with a composed key if the product of all possible values is bigger than 2^63 we get a ValueError "negative dimensions are not allowed"
when we call len(grouped_data)
.
A simple version to reproduce it:
values = range(55109) data = pd.DataFrame.from_dict({'a': values, 'b': values, 'c': values, 'd': values}) grouped = data.groupby(['a', 'b', 'c', 'd']) len(grouped)
A side effect of this error is that if there are NaN values as possible keys it won't ignore them, it will replace the NaN values with some other values present in the index.
Here there is a complete IPython notebook example to reproduce it:
http://nbviewer.ipython.org/gist/jordeu/cd86fc99f5f89451cf93