Resample category data with timedelta index · Issue #12169 · pandas-dev/pandas (original) (raw)

Hi,

I get a very strange behavior when i try to resample categorical data with and timedelta index, as compared to a datetime index.

>> d1 = pd.DataFrame({'Group_obj': 'A'}, index=pd.date_range('2000-1-1', periods=20, freq='s'))
>> d1['Group'] = d1['Group_obj'].astype('category')
>> d1
                    Group_obj Group
2000-01-01 00:00:00         A     A
2000-01-01 00:00:01         A     A
2000-01-01 00:00:02         A     A
2000-01-01 00:00:03         A     A
2000-01-01 00:00:04         A     A
2000-01-01 00:00:05         A     A
2000-01-01 00:00:06         A     A
2000-01-01 00:00:07         A     A
2000-01-01 00:00:08         A     A
2000-01-01 00:00:09         A     A
2000-01-01 00:00:10         A     A
2000-01-01 00:00:11         A     A
2000-01-01 00:00:12         A     A
2000-01-01 00:00:13         A     A
2000-01-01 00:00:14         A     A
2000-01-01 00:00:15         A     A
2000-01-01 00:00:16         A     A
2000-01-01 00:00:17         A     A
2000-01-01 00:00:18         A     A
2000-01-01 00:00:19         A     A

>> corr = d1.resample('10s', how=lambda x: (x.value_counts().index[0]))
>> corr
                    Group_obj Group
2000-01-01 00:00:00         A     A
2000-01-01 00:00:10         A     A

>> corr.dtypes
Group_obj    object
Group        object
dtype: object

>> d2 = d1.set_index(pd.to_timedelta(list(range(20)), unit='s'))
>> fxx = d2.resample('10s', how=lambda x: (x.value_counts().index[0]))
>> fxx
         Group_obj  Group
00:00:00         A    NaN
00:00:10         A    NaN

>> fxx.dtypes
Group_obj     object
Group        float64
dtype: object

It seems to me the aggregated result in case of using timedelta as an index for the category is always NaN.
Should this be?

Thx

PS: is there a way to specify the dtype for the aggregated columns?