API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 by jreback · Pull Request #11603 · pandas-dev/pandas (original) (raw)
closes #10702
closes #9052
xref #4950, removal of depr
So this basically takes all of the pd.rolling_*
,pd.expanding_*
,pd.ewma_*
routines and allows an object oriented interface, similar to groupby.
Some benefits:
- nice tab completions on the Rolling/Expanding/EWM objects
- much cleaner code internally
- complete back compat, e.g. everything just works like it did
- added a
.agg/aggregate
function, similar to groupby, where you can do multiple aggregations at once - added
__getitem__
accessing, e.g.df.rolling(....)['A','B'].sum()
for a nicer API - allows for much of API/ENH: master issue for pd.rolling_apply #8659 to be done very easily
- fix for coercing Timedeltas properly
- handling nuiscance (string) columns
Other:
- along with window doc rewrite, fixed doc-strings for groupby/window to provide back-refs
ToDO:
- I think that all of the doc-strings are correct, but need check
- implement
.agg
- update API.rst, what's new
- deprecate the
pd.expanding_*
,pd.rolling_*
,pd.ewma_*
interface as this is polluting the top-level namespace quite a bit - change the docs to use the new API
In [4]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'})
In [5]: df.rolling(window=2).sum()
Out[5]:
A B C
0 NaN NaT foo
1 1 3 days foo
2 3 5 days foo
3 5 7 days foo
4 7 9 days foo
In [6]: df.rolling(window=2)['A','C'].sum()
Out[6]:
A C
0 NaN foo
1 1 foo
2 3 foo
3 5 foo
4 7 foo
In [2]: r = df.rolling(window=3)
In [3]: r.
r.A r.C r.corr r.cov r.max r.median r.name r.skew r.sum
r.B r.apply r.count r.kurt r.mean r.min r.quantile r.std r.var
do rolling/expanding/ewma ops
In [1]: s = Series(range(5))
In [2]: r = s.rolling(2)
I# pd.rolling_sum
In [3]: r.sum()
Out[3]:
0 NaN
1 1
2 3
3 5
4 7
dtype: float64
# nicer repr
In [4]: r
Out[4]: Rolling [window->2,center->False,axis->0]
In [5]: e = s.expanding(min_periods=2)
# pd.expanding_sum
In [6]: e.sum()
Out[6]:
0 NaN
1 1
2 3
3 6
4 10
dtype: float64
In [7]: em = s.ewm(com=10)
# pd.ewma
In [8]: em.mean()
Out[8]:
0 0.000000
1 0.523810
2 1.063444
3 1.618832
4 2.189874
dtype: float64
and allow the various aggregation type of ops (similar to groupby)
In [1]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'})
In [2]: r = df.rolling(2,min_periods=1)
In [3]: r.agg([np.sum,np.mean])
Out[3]:
A B C
sum mean sum mean sum mean
0 0 0.0 1 days 1 days 00:00:00 foo foo
1 1 0.5 3 days 1 days 12:00:00 foo foo
2 3 1.5 5 days 2 days 12:00:00 foo foo
3 5 2.5 7 days 3 days 12:00:00 foo foo
4 7 3.5 9 days 4 days 12:00:00 foo foo
In [4]: r.agg({'A' : 'sum', 'B' : 'mean'})
Out[4]:
A B
0 0 1 days 00:00:00
1 1 1 days 12:00:00
2 3 2 days 12:00:00
3 5 3 days 12:00:00
4 7 4 days 12:00:00