BUG: RecursionError using agg on a resampled SeriesGroupBy · Issue #42905 · pandas-dev/pandas (original) (raw)


When you mix resample with groupby and try to use the agg method to supply multiple functions to either a DataFrameGroupBy or SeriesGroupBy, Python suddently exits without even raising an error.

I first thought I was running into this because I was supplying a single column expecting a DataFrame with multiple columns, but I can confirm this happens to me whether I provide a column (variable b) or apply the method to the entire GroupBy (variable c):

Code Sample

import pandas as pd

a = pd.DataFrame({ 'class': { 0: 'beta', 1: 'alpha', 2: 'alpha', 3: 'gaga', 4: 'beta', 5: 'gaga', 6: 'beta', 7: 'gaga', 8: 'beta', 9: 'gaga', 10: 'alpha', 11: 'beta', 12: 'alpha', 13: 'gaga', 14: 'alpha'}, 'value': { 0: 69, 1: 33, 2: 40, 3: 2, 4: 36, 5: 40, 6: 48, 7: 84, 8: 77, 9: 22, 10: 55, 11: 82, 12: 37, 13: 88, 14: 41}, 'date': { 0: pd.Timestamp('2021-02-28 00:00:00'), 1: pd.Timestamp('2021-11-30 00:00:00'), 2: pd.Timestamp('2021-02-28 00:00:00'), 3: pd.Timestamp('2021-04-30 00:00:00'), 4: pd.Timestamp('2021-02-28 00:00:00'), 5: pd.Timestamp('2021-04-30 00:00:00'), 6: pd.Timestamp('2021-07-31 00:00:00'), 7: pd.Timestamp('2021-01-31 00:00:00'), 8: pd.Timestamp('2021-01-31 00:00:00'), 9: pd.Timestamp('2021-01-31 00:00:00'), 10: pd.Timestamp('2021-04-30 00:00:00'), 11: pd.Timestamp('2021-10-31 00:00:00'), 12: pd.Timestamp('2021-09-30 00:00:00'), 13: pd.Timestamp('2021-04-30 00:00:00'), 14: pd.Timestamp('2021-05-31 00:00:00')}})

This will exit Python

b = a
.set_index('date')
.groupby('class')
.resample('M')['value']
.agg(['sum', 'size'])

Not informing a column will ALSO make Python exit

c = a
.set_index('date')
.groupby('class')
.resample('M')
.agg(['sum', 'size'])

Problem description

I'm not sure if this method is supported for instances of DatetimeIndexResamplerGroupby objects, but calling it without arguments is valid, giving:

<bound method Resampler.aggregate of <pandas.core.resample.DatetimeIndexResamplerGroupby object at 0x00000163B22B0100>>

Also, while the problem arises with either a Series or a DataFrame, given that using agg with multiple functions on a SeriesGroupBy will correctly create a DataFrame, I would expect the same to happen when resampling with timestamps:

In [1]: a.groupby('class')['value'].agg(['sum', 'size']) Out[1]: sum size class alpha 206 5 beta 312 5 gaga 236 5

Expected Output

                  sum  size
class date
alpha 2021-02-28   40     1
      2021-03-31    0     0
      2021-04-30   55     1
      2021-05-31   41     1
      2021-06-30    0     0
      2021-07-31    0     0
      2021-08-31    0     0
      2021-09-30   37     1
      2021-10-31    0     0
      2021-11-30   33     1
beta  2021-01-31   77     1
      2021-02-28  105     2
      2021-03-31    0     0
      2021-04-30    0     0
      2021-05-31    0     0
      2021-06-30    0     0
      2021-07-31   48     1
      2021-08-31    0     0
      2021-09-30    0     0
      2021-10-31   82     1
gaga  2021-01-31  106     2
      2021-02-28    0     0
      2021-03-31    0     0
      2021-04-30  130     3

Output of pd.show_versions()

INSTALLED VERSIONS

commit : c7f7443
python : 3.9.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : pt_BR.cp1252

pandas : 1.3.1
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.1
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : 3.5.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : 1.0.8
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None