Resample category data with timedelta index · Issue #12169 · pandas-dev/pandas (original) (raw)
Hi,
I get a very strange behavior when i try to resample categorical data with and timedelta index, as compared to a datetime index.
>> d1 = pd.DataFrame({'Group_obj': 'A'}, index=pd.date_range('2000-1-1', periods=20, freq='s'))
>> d1['Group'] = d1['Group_obj'].astype('category')
>> d1
Group_obj Group
2000-01-01 00:00:00 A A
2000-01-01 00:00:01 A A
2000-01-01 00:00:02 A A
2000-01-01 00:00:03 A A
2000-01-01 00:00:04 A A
2000-01-01 00:00:05 A A
2000-01-01 00:00:06 A A
2000-01-01 00:00:07 A A
2000-01-01 00:00:08 A A
2000-01-01 00:00:09 A A
2000-01-01 00:00:10 A A
2000-01-01 00:00:11 A A
2000-01-01 00:00:12 A A
2000-01-01 00:00:13 A A
2000-01-01 00:00:14 A A
2000-01-01 00:00:15 A A
2000-01-01 00:00:16 A A
2000-01-01 00:00:17 A A
2000-01-01 00:00:18 A A
2000-01-01 00:00:19 A A
>> corr = d1.resample('10s', how=lambda x: (x.value_counts().index[0]))
>> corr
Group_obj Group
2000-01-01 00:00:00 A A
2000-01-01 00:00:10 A A
>> corr.dtypes
Group_obj object
Group object
dtype: object
>> d2 = d1.set_index(pd.to_timedelta(list(range(20)), unit='s'))
>> fxx = d2.resample('10s', how=lambda x: (x.value_counts().index[0]))
>> fxx
Group_obj Group
00:00:00 A NaN
00:00:10 A NaN
>> fxx.dtypes
Group_obj object
Group float64
dtype: object
It seems to me the aggregated result in case of using timedelta as an index for the category is always NaN.
Should this be?
Thx
PS: is there a way to specify the dtype for the aggregated columns?