API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 by jreback · Pull Request #11603 · pandas-dev/pandas (original) (raw)

closes #10702
closes #9052
xref #4950, removal of depr

So this basically takes all of the pd.rolling_*,pd.expanding_*,pd.ewma_* routines and allows an object oriented interface, similar to groupby.

Some benefits:

Other:

ToDO:

In [4]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'})

In [5]: df.rolling(window=2).sum()
Out[5]: 
    A      B    C
0 NaN    NaT  foo
1   1 3 days  foo
2   3 5 days  foo
3   5 7 days  foo
4   7 9 days  foo

In [6]: df.rolling(window=2)['A','C'].sum()
Out[6]: 
    A    C
0 NaN  foo
1   1  foo
2   3  foo
3   5  foo
4   7  foo

In [2]: r = df.rolling(window=3)

In [3]: r.
r.A         r.C         r.corr      r.cov       r.max       r.median    r.name      r.skew      r.sum       
r.B         r.apply     r.count     r.kurt      r.mean      r.min       r.quantile  r.std       r.var       

do rolling/expanding/ewma ops

In [1]: s = Series(range(5))

In [2]: r = s.rolling(2)

I# pd.rolling_sum
In [3]: r.sum()
Out[3]: 
0   NaN
1     1
2     3
3     5
4     7
dtype: float64

# nicer repr
In [4]: r
Out[4]: Rolling [window->2,center->False,axis->0]

In [5]: e = s.expanding(min_periods=2)

# pd.expanding_sum
In [6]: e.sum()
Out[6]: 
0   NaN
1     1
2     3
3     6
4    10
dtype: float64

In [7]: em = s.ewm(com=10)

# pd.ewma
In [8]: em.mean()
Out[8]: 
0    0.000000
1    0.523810
2    1.063444
3    1.618832
4    2.189874
dtype: float64

and allow the various aggregation type of ops (similar to groupby)

In [1]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'})

In [2]: r = df.rolling(2,min_periods=1)

In [3]: r.agg([np.sum,np.mean])
Out[3]: 
    A           B                    C     
  sum mean    sum            mean  sum mean
0   0  0.0 1 days 1 days 00:00:00  foo  foo
1   1  0.5 3 days 1 days 12:00:00  foo  foo
2   3  1.5 5 days 2 days 12:00:00  foo  foo
3   5  2.5 7 days 3 days 12:00:00  foo  foo
4   7  3.5 9 days 4 days 12:00:00  foo  foo

In [4]: r.agg({'A' : 'sum', 'B' : 'mean'})
Out[4]: 
   A               B
0  0 1 days 00:00:00
1  1 1 days 12:00:00
2  3 2 days 12:00:00
3  5 3 days 12:00:00
4  7 4 days 12:00:00