BUG: Groupby NaT Handling (original) (raw)
xref #9236
There seems to be an inconsistency in some GroupBy methods when NaT is included in the group key.
GroupBy.groupsincludesNaTas a key.GroupBy.ngroupsdoesn't countNaT.GroupBy.__iter__doesn't returnNaTgroup.GroupBy.get_groupfails whenNaTis specified.
I understand NaT should be included in the group key according to other function's behaviour, such as dropna. Is it OK to fix it to include NaT?
import pandas as pd
import numpy as np
>>> df = pd.DataFrame({'values': np.random.randn(8),
'dt': [np.nan, pd.Timestamp('2013-01-01'), np.nan, pd.Timestamp('2013-02-01'),
np.nan, pd.Timestamp('2013-02-01'), np.nan, pd.Timestamp('2013-01-01')]})
>>> grouped = df.groupby('dt')
>>> grouped.groups
{numpy.datetime64('NaT'): [0, 2, 4, 6], numpy.datetime64('2013-01-01T09:00:00.000000000+0900'): [1, 7], numpy.datetime64('2013-02-01T09:00:00.000000000+0900'): [3, 5]}
>>> grouped.ngroups
2
>>> grouped.indices
# ValueError: DatetimeIndex with NaT cannot be converted to object
>>> grouped.get_group(pd.NaT)
ValueError: DatetimeIndex with NaT cannot be converted to object