Bug in rolling_* functions when combining center and min_periods arguments · Issue #7925 · pandas-dev/pandas (original) (raw)
Hello,
I have found some incorrect behavior in the pd.rolling_* functions regarding the combination of 'center' and 'min_periods' arguments. I believe this is an issue for all rolling_* functions. Here is an example using rolling_mean:
When using only the 'center' argument, the behavior works as expected:
In [5]: df = pd.DataFrame(range(10)) In [6]: df Out[6]: 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 In [7]: pd.rolling_mean(df, window=3, center=True) Out[7]: 0 0 NaN 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 NaN
When using the 'min_periods' argument, the behavior also works as expected:
In [8]: pd.rolling_mean(df, window=3, min_periods=1) Out[8]: 0 0 0.0 1 0.5 2 1.0 3 2.0 4 3.0 5 4.0 6 5.0 7 6.0 8 7.0 9 8.0
When combining both the 'center' and 'min_periods' arguments, however, the result is incorrect. The last entry should be 8.5 I believe.
In [9]: pd.rolling_mean(df, window=3, center=True, min_periods=1) Out[9]: 0 0 0.5 1 1.0 2 2.0 3 3.0 4 4.0 5 5.0 6 6.0 7 7.0 8 8.0 9 NaN
In [10]: pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.3.final.0 python-bits: 64 OS: Linux OS-release: 3.2.0-4-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.13.1 Cython: 0.19.2 numpy: 1.8.0 scipy: 0.13.0 statsmodels: 0.6.0.dev-bed3499 IPython: 2.0.0 sphinx: 1.2.1 patsy: 0.2.1 scikits.timeseries: None dateutil: 1.5 pytz: 2012c bottleneck: None tables: 2.3.1 numexpr: 2.0.1 matplotlib: 1.3.1 openpyxl: 1.5.8 xlrd: 0.6.1 xlwt: 0.7.4 xlsxwriter: None sqlalchemy: None lxml: None bs4: None html5lib: None bq: None apiclient: None