Rolling groupby should not maintain the by column in the resulting DataFrame · Issue #14013 · pandas-dev/pandas (original) (raw)

I found another oddity while digging through #13966.

Begin with the initial DataFrame in that issue:

df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.arange(40)})

Save the grouping:

In [215]: g = df.groupby('A')

Compute the rolling sum:

In [216]: r = g.rolling(4)

In [217]: r.sum()
Out[217]:
         A      B
A
1 0    NaN    NaN
  1    NaN    NaN
  2    NaN    NaN
  3    4.0    6.0
  4    4.0   10.0
  5    4.0   14.0
  6    4.0   18.0
  7    4.0   22.0
  8    4.0   26.0
  9    4.0   30.0
...    ...    ...
2 30   8.0  114.0
  31   8.0  118.0
3 32   NaN    NaN
  33   NaN    NaN
  34   NaN    NaN
  35  12.0  134.0
  36  12.0  138.0
  37  12.0  142.0
  38  12.0  146.0
  39  12.0  150.0

[40 rows x 2 columns]

It maintains the by column (A)! That column should not be in the resulting DataFrame.

It gets weirder if I compute the sum over the entire grouping and then re-do the rolling calculation. Now by column is gone as expected:

In [218]: g.sum()
Out[218]:
     B
A
1  190
2  306
3  284

In [219]: r.sum()
Out[219]:
          B
A
1 0     NaN
  1     NaN
  2     NaN
  3     6.0
  4    10.0
  5    14.0
  6    18.0
  7    22.0
  8    26.0
  9    30.0
...     ...
2 30  114.0
  31  118.0
3 32    NaN
  33    NaN
  34    NaN
  35  134.0
  36  138.0
  37  142.0
  38  146.0
  39  150.0

[40 rows x 1 columns]

So the grouping summation has some sort of side effect.