groupby.mean, etc, doesn't recognize timedelta64 · Issue #5724 · pandas-dev/pandas (original) (raw)

See http://stackoverflow.com/questions/20625982/split-apply-combine-on-pandas-timedelta-column
related as well: http://stackoverflow.com/questions/20789976/python-pandas-dataframe-1st-line-issue-with-datetime-timedelta/20802902#20802902

I have a DataFrame with a column of timedeltas (actually upon inspection the dtype is timedelta64[ns] or '<m8[ns]'), and I'd like to do a split-combine-apply, but the timedelta column is being dropped:

import pandas as pd

import numpy as np

pd.version Out[3]: '0.13.0rc1'

np.version Out[4]: '1.8.0'

data = pd.DataFrame(np.random.rand(10, 3), columns=['f1', 'f2', 'td'])

data['td'] *= 10000000

data['td'] = pd.Series(data['td'], dtype='<m8[ns]')

data Out[8]: f1 f2 td 0 0.990140 0.948313 00:00:00.003066 1 0.277125 0.993549 00:00:00.001443 2 0.016427 0.581129 00:00:00.009257 3 0.048662 0.512215 00:00:00.000702 4 0.846301 0.179160 00:00:00.000396 5 0.568323 0.419887 00:00:00.000266 6 0.328182 0.919897 00:00:00.006138 7 0.292882 0.213219 00:00:00.008876 8 0.623332 0.003409 00:00:00.000322 9 0.650436 0.844180 00:00:00.006873

[10 rows x 3 columns]

data.groupby(data.index < 5).mean() Out[9]: f1 f2 False 0.492631 0.480118 True 0.435731 0.642873

[2 rows x 2 columns]

Or, forcing pandas to try the operation on the 'td' column:

data.groupby(data.index < 5)['td'].mean()

DataError Traceback (most recent call last) in () ----> 1 data.groupby(data.index < 5)['td'].mean()

/path/to/lib/python3.3/site-packages/pandas-0.13.0rc1-py3.3-linux-x86_64.egg/pandas/core/groupby.py in mean(self) 417 """ 418 try: --> 419 return self._cython_agg_general('mean') 420 except GroupByError: 421 raise

/path/to/lib/python3.3/site-packages/pandas-0.13.0rc1-py3.3-linux-x86_64.egg/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only) 669 670 if len(output) == 0: --> 671 raise DataError('No numeric types to aggregate') 672 673 return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

However, taking the mean of the column works fine, so numeric operations should be possible:

data['td'].mean() Out[11]: 0 00:00:00.003734 dtype: timedelta64[ns]