BUG: df.groupby().resample()[[cols]]
without key columns raise KeyError · Issue #47362 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np import pandas as pd
rng = np.random.default_rng(0) df_re = pd.DataFrame( { "date": pd.date_range(start="2016-01-01", periods=50), "group": rng.integers(0, 2, 50), "val": rng.integers(0, 10, 50), } )
df_re.groupby("group").resample("10D", on="date")[["val"]].mean()
Issue Description
I have a DataFrame.
import numpy as np import pandas as pd
rng = np.random.default_rng(0) df_re = pd.DataFrame( { "date": pd.date_range(start="2016-01-01", periods=50), "group": rng.integers(0, 2, 50), "val": rng.integers(0, 10, 50), } )
df_re.head()
date group val
0 2016-01-01 1 7
1 2016-01-02 1 3
2 2016-01-03 1 4
3 2016-01-04 0 9
4 2016-01-05 0 8
The following is simple groupby()
and resample()
case.
df_re.groupby("group").mean() df_re.groupby("group")["val"].mean() # -> Series df_re.groupby("group")[["val"]].mean() # -> DataFrame
df_re.resample("10D", on="date").mean() df_re.resample("10D", on="date")["val"].mean() # -> Series df_re.resample("10D", on="date")[["val"]].mean() # -> DataFrame
If []
is added, the Series is returned; if [[]]
is added, the DataFrame is returned. It is not necessary to include the group key("group"
or "date"
) in the []
/[[]]
.
For multi-groups, pd.Grouper()
can be used. It is working the same way.
df_re.groupby(["group", pd.Grouper(freq="10D", key="date")]).mean() df_re.groupby(["group", pd.Grouper(freq="10D", key="date")])["val"].mean() # -> Series df_re.groupby(["group", pd.Grouper(freq="10D", key="date")])[["val"]].mean() # -> DataFrame
However, groupby().resample()
works a little differently. It cannot return the DataFrame without passing key columns of resample()
.
df_re.groupby("group").resample("10D", on="date").mean() df_re.groupby("group").resample("10D", on="date")["val"].mean() # -> Series
df_re.groupby("group").resample("10D", on="date")[["val"]].mean()
-> KeyError: 'The grouper name date is not found'
df_re.groupby("group").resample("10D", on="date")[["val", "date"]].mean() # -> DataFrame
I think this behavior is unnatural. (Even if this is not a bug, this error message is difficult to understand.)
Expected Behavior
df_re.groupby("group").resample("10D", on="date")[["val"]].mean()
should return a DataFrame which is equal to df_re.groupby("group").resample("10D", on="date")[["val", "date"]].mean()
and df_re.groupby("group").resample("10D", on="date")["val"].mean().to_frame()
Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.10.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Japanese_Japan.932
pandas : 1.4.2
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
pip : 22.1.2
setuptools : 62.3.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli :
fastparquet : None
fsspec : 2022.5.0
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.2
numba : 0.55.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
zstandard : None