Rolling groupby should not maintain the by column in the resulting DataFrame · Issue #14013 · pandas-dev/pandas (original) (raw)
I found another oddity while digging through #13966.
Begin with the initial DataFrame in that issue:
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
'B': np.arange(40)})
Save the grouping:
In [215]: g = df.groupby('A')
Compute the rolling sum:
In [216]: r = g.rolling(4)
In [217]: r.sum()
Out[217]:
A B
A
1 0 NaN NaN
1 NaN NaN
2 NaN NaN
3 4.0 6.0
4 4.0 10.0
5 4.0 14.0
6 4.0 18.0
7 4.0 22.0
8 4.0 26.0
9 4.0 30.0
... ... ...
2 30 8.0 114.0
31 8.0 118.0
3 32 NaN NaN
33 NaN NaN
34 NaN NaN
35 12.0 134.0
36 12.0 138.0
37 12.0 142.0
38 12.0 146.0
39 12.0 150.0
[40 rows x 2 columns]
It maintains the by
column (A
)! That column should not be in the resulting DataFrame.
It gets weirder if I compute the sum over the entire grouping and then re-do the rolling calculation. Now by
column is gone as expected:
In [218]: g.sum()
Out[218]:
B
A
1 190
2 306
3 284
In [219]: r.sum()
Out[219]:
B
A
1 0 NaN
1 NaN
2 NaN
3 6.0
4 10.0
5 14.0
6 18.0
7 22.0
8 26.0
9 30.0
... ...
2 30 114.0
31 118.0
3 32 NaN
33 NaN
34 NaN
35 134.0
36 138.0
37 142.0
38 146.0
39 150.0
[40 rows x 1 columns]
So the grouping summation has some sort of side effect.