Inconsistent return type when grouping dates by frequency with custom reduction function · Issue #11742 · pandas-dev/pandas (original) (raw)

If I group a DataFrame by a column of dates, the return type varies depending on whether I just group or whether I also apply a frequency in the Grouper.

Grouping without resampling dates returns a DataFrame when I apply a function which returns a labeled Series, or a Series if the function returns a scalar:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], 'value': [10, 13]})

In [3]: def sumfunc(x):
   ...:     return pd.Series([x['value'].sum()], ('sum',))
   ...: 

In [4]: df.groupby(pd.Grouper(key='date')).apply(sumfunc)
Out[4]: 
            sum
date           
10/10/2000   10
11/10/2000   13

In [5]: type(df.groupby(pd.Grouper(key='date')).apply(sumfunc))
Out[5]: pandas.core.frame.DataFrame

In [17]: df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum())
Out[17]: 
date
2000-10-10    10
2000-11-10    13
dtype: int64

In [18]: type(df.groupby(pd.Grouper(key='date')).apply(lambda x: x.value.sum()))
Out[18]: pandas.core.series.Series

If I apply a frequency in the Grouper, I get a Series with a multi-index when the function returns a labeled Series, or a TypeError when it returns a scalar.

In [6]: df['date'] = pd.to_datetime(df['date'])

In [7]: df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc)
Out[7]: 
date           
2000-10-31  sum    10
2000-11-30  sum    13
dtype: int64

In [8]: type(df.groupby(pd.Grouper(freq='M', key='date')).apply(sumfunc))
Out[8]: pandas.core.series.Series

In [16]: df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-ad73d0ebc475> in <module>()
----> 1 df.groupby(pd.Grouper(freq='M', key='date')).apply(lambda x: x.value.sum())

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    713         # ignore SettingWithCopy here in case the user mutates
    714         with option_context('mode.chained_assignment',None):
--> 715             return self._python_apply_general(f)
    716 
    717     def _python_apply_general(self, f):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
    720 
    721         return self._wrap_applied_output(keys, values,
--> 722                                          not_indexed_same=mutated)
    723 
    724     def aggregate(self, func, *args, **kwargs):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   3253             # Handle cases like BinGrouper
   3254             return self._concat_objects(keys, values,
-> 3255                                         not_indexed_same=not_indexed_same)
   3256 
   3257     def _transform_general(self, func, *args, **kwargs):

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/core/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
   1271                 group_names = self.grouper.names
   1272                 result = concat(values, axis=self.axis, keys=group_keys,
-> 1273                                 levels=group_levels, names=group_names)
   1274             else:
   1275 

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    810                        keys=keys, levels=levels, names=names,
    811                        verify_integrity=verify_integrity,
--> 812                        copy=copy)
    813     return op.get_result()
    814 

/Users/shoover/.py35/lib/python3.5/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    866         for obj in objs:
    867             if not isinstance(obj, NDFrame):
--> 868                 raise TypeError("cannot concatenate a non-NDFrame object")
    869 
    870             # consolidate

TypeError: cannot concatenate a non-NDFrame object

Since in this example, assigning dates to months still leaves the same groups, I would have expected identical results whether I set freq='M' or not. I'm guessing that the difference is that the freq='M' causes an extra groupby to happen under the hood, yes? When I ran into this, what I expected to happen was for pd.Grouper(freq='M', key='date') to do a single groupby, combining rows where dates happened to fall into the same month.

Pandas version:

In [9]: pd.__version__
Out[9]: '0.17.1+22.g0c43fcc'