Inconsistent return type when grouping dates by frequency with custom reduction function · Issue #11742 · pandas-dev/pandas (original) (raw)
If I group a DataFrame
by a column of dates, the return type varies depending on whether I just group or whether I also apply a frequency in the Grouper
.
Grouping without resampling dates returns a DataFrame
when I apply a function which returns a labeled Series
, or a Series
if the function returns a scalar:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], 'value': [10, 13]})
In [3]: def sumfunc(x):
...: return pd.Series([x['value'].sum()], ('sum',))
...:
In [4]: df.groupby(pd.Grouper(key='date')).apply(sumfunc)
Out[4]:
sum
date
10/10/2000 10
11/10/2000 13
In [5]: type(df.groupby(pd.Grouper(key='date')).apply(sumfunc))
Out[5]: pandas.core.frame.DataFrame
In [17]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum())
Out[17]:
date
2000-10-10 10
2000-11-10 13
dtype: int64
In [18]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum()))
Out[18]: pandas.core.series.Series
If I apply a frequency in the Grouper
, I get a Series
with a multi-index when the function returns a labeled Series
, or a TypeError
when it returns a scalar.
In [6]: df['date'] = pd.to_datetime(df['date'])
In [7]: df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc)
Out[7]:
date
2000-10-31 sum 10
2000-11-30 sum 13
dtype: int64
In [8]: type(df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc))
Out[8]: pandas.core.series.Series
In [16]: df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-ad73d0ebc475> in <module>()
----> 1 df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
713 # ignore SettingWithCopy here in case the user mutates
714 with option_context('mode.chained_assignment',None):
--> 715 return self._python_apply_general(f)
716
717 def _python_apply_general(self, f):
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
720
721 return self._wrap_applied_output(keys, values,
--> 722 not_indexed_same=mutated)
723
724 def aggregate(self, func, *args, **kwargs):
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
3253 # Handle cases like BinGrouper
3254 return self._concat_objects(keys, values,
-> 3255 not_indexed_same=not_indexed_same)
3256
3257 def _transform_general(self, func, *args, **kwargs):
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
1271 group_names = self.grouper.names
1272 result = concat(values, axis=self.axis, keys=group_keys,
-> 1273 levels=group_levels, names=group_names)
1274 else:
1275
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
810 keys=keys, levels=levels, names=names,
811 verify_integrity=verify_integrity,
--> 812 copy=copy)
813 return op.get_result()
814
/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
866 for obj in objs:
867 if not isinstance(obj, NDFrame):
--> 868 raise TypeError("cannot concatenate a non-NDFrame object")
869
870 # consolidate
TypeError: cannot concatenate a non-NDFrame object
Since in this example, assigning dates to months still leaves the same groups, I would have expected identical results whether I set freq='M'
or not. I'm guessing that the difference is that the freq='M'
causes an extra groupby
to happen under the hood, yes? When I ran into this, what I expected to happen was for pd.Grouper(freq='M', key='date')
to do a single groupby
, combining rows where dates happened to fall into the same month.
Pandas version:
In [9]: pd.__version__
Out[9]: '0.17.1+22.g0c43fcc'