Bug when computing rolling_mean with extreme value (original) (raw)

Hello,
Please consider the following code :

import pandas as pd
import numpy as ny

dates = pd.date_range("2015-01-01", periods=10, freq="D")
ts = pd.TimeSeries(data=range(10), index=dates, dtype=ny.float64)
ts_mean = pd.rolling_mean(ts, 5)
print(ts) 
2015-01-01    0
2015-01-02    1
2015-01-03    2
2015-01-04    3
2015-01-05    4
2015-01-06    5
2015-01-07    6
2015-01-08    7
2015-01-09    8
2015-01-10    9
Freq: D, dtype: float64

print(ts_mean)
2015-01-01   NaN
2015-01-02   NaN
2015-01-03   NaN
2015-01-04   NaN
2015-01-05     2
2015-01-06     3
2015-01-07     4
2015-01-08     5
2015-01-09     6
2015-01-10     7
Freq: D, dtype: float64

For the last date (2015-01-10), you should obtain 7, which corresponds to [5, 6, 7, 8, 9] mean value.
Now, please replace the 2015-01-03 value by -9+33 extreme value.

dates = pd.date_range("2015-01-01", periods=10, freq="D")
ts = pd.TimeSeries(data=range(10), index=dates, dtype=ny.float64)
ts[2] = -9e+33
print(ts)
2015-01-01    0.000000e+00
2015-01-02    1.000000e+00
2015-01-03   -9.000000e+33
2015-01-04    3.000000e+00
2015-01-05    4.000000e+00
2015-01-06    5.000000e+00
2015-01-07    6.000000e+00
2015-01-08    7.000000e+00
2015-01-09    8.000000e+00
2015-01-10    9.000000e+00
Freq: D, dtype: float64

And compute rolling_mean again :

ts_mean = pd.rolling_mean(ts, 5)
print(ts_mean)
2015-01-01             NaN
2015-01-02             NaN
2015-01-03             NaN
2015-01-04             NaN
2015-01-05   -1.800000e+33
2015-01-06   -1.800000e+33
2015-01-07   -1.800000e+33
2015-01-08    0.000000e+00
2015-01-09    1.000000e+00
2015-01-10    2.000000e+00
Freq: D, dtype: float64

As you can see, from the 2015-01-08, computation returns an incorrect result i.e [1, 2, 3] instead of [5, 6, 7]. The extreme value has introduced some perturbations in following date computation.

Best regards,