BUG: RecursionError using agg
on a resampled SeriesGroupBy · Issue #42905 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
When you mix resample
with groupby
and try to use the agg
method to supply multiple functions to either a DataFrameGroupBy or SeriesGroupBy, Python suddently exits without even raising an error.
I first thought I was running into this because I was supplying a single column expecting a DataFrame with multiple columns, but I can confirm this happens to me whether I provide a column (variable b
) or apply the method to the entire GroupBy (variable c
):
Code Sample
import pandas as pd
a = pd.DataFrame({ 'class': { 0: 'beta', 1: 'alpha', 2: 'alpha', 3: 'gaga', 4: 'beta', 5: 'gaga', 6: 'beta', 7: 'gaga', 8: 'beta', 9: 'gaga', 10: 'alpha', 11: 'beta', 12: 'alpha', 13: 'gaga', 14: 'alpha'}, 'value': { 0: 69, 1: 33, 2: 40, 3: 2, 4: 36, 5: 40, 6: 48, 7: 84, 8: 77, 9: 22, 10: 55, 11: 82, 12: 37, 13: 88, 14: 41}, 'date': { 0: pd.Timestamp('2021-02-28 00:00:00'), 1: pd.Timestamp('2021-11-30 00:00:00'), 2: pd.Timestamp('2021-02-28 00:00:00'), 3: pd.Timestamp('2021-04-30 00:00:00'), 4: pd.Timestamp('2021-02-28 00:00:00'), 5: pd.Timestamp('2021-04-30 00:00:00'), 6: pd.Timestamp('2021-07-31 00:00:00'), 7: pd.Timestamp('2021-01-31 00:00:00'), 8: pd.Timestamp('2021-01-31 00:00:00'), 9: pd.Timestamp('2021-01-31 00:00:00'), 10: pd.Timestamp('2021-04-30 00:00:00'), 11: pd.Timestamp('2021-10-31 00:00:00'), 12: pd.Timestamp('2021-09-30 00:00:00'), 13: pd.Timestamp('2021-04-30 00:00:00'), 14: pd.Timestamp('2021-05-31 00:00:00')}})
This will exit Python
b = a
.set_index('date')
.groupby('class')
.resample('M')['value']
.agg(['sum', 'size'])
Not informing a column will ALSO make Python exit
c = a
.set_index('date')
.groupby('class')
.resample('M')
.agg(['sum', 'size'])
Problem description
I'm not sure if this method is supported for instances of DatetimeIndexResamplerGroupby
objects, but calling it without arguments is valid, giving:
<bound method Resampler.aggregate of <pandas.core.resample.DatetimeIndexResamplerGroupby object at 0x00000163B22B0100>>
Also, while the problem arises with either a Series or a DataFrame, given that using agg
with multiple functions on a SeriesGroupBy
will correctly create a DataFrame, I would expect the same to happen when resampling with timestamps:
In [1]: a.groupby('class')['value'].agg(['sum', 'size']) Out[1]: sum size class alpha 206 5 beta 312 5 gaga 236 5
Expected Output
sum size
class date
alpha 2021-02-28 40 1
2021-03-31 0 0
2021-04-30 55 1
2021-05-31 41 1
2021-06-30 0 0
2021-07-31 0 0
2021-08-31 0 0
2021-09-30 37 1
2021-10-31 0 0
2021-11-30 33 1
beta 2021-01-31 77 1
2021-02-28 105 2
2021-03-31 0 0
2021-04-30 0 0
2021-05-31 0 0
2021-06-30 0 0
2021-07-31 48 1
2021-08-31 0 0
2021-09-30 0 0
2021-10-31 82 1
gaga 2021-01-31 106 2
2021-02-28 0 0
2021-03-31 0 0
2021-04-30 130 3
Output of pd.show_versions()
INSTALLED VERSIONS
commit : c7f7443
python : 3.9.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : pt_BR.cp1252
pandas : 1.3.1
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.1
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : 3.5.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : 1.0.8
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None