API: meaning of min_periods for ewm*() functions? · Issue #7977 · pandas-dev/pandas (original) (raw)
The interpretation of min_periods
in the ewm*()
functions seems rather odd to me. For example (in 0.14.1):
In [19]: x
Out[19]:
0 0
1 NaN
2 NaN
3 NaN
4 4
5 NaN
6 6
dtype: float64
In [20]: ewma(x, com=3., min_periods=2)
Out[20]:
0 NaN
1 NaN
2 0.000000
3 0.000000
4 2.285714
5 2.285714
6 3.891892
dtype: float64
The way it works, is it finds the first non-NaN
value (0
in the example above) and then makes sure that the min_periods
entries (min_periods-1
in 0.15.0, per #7898) in the result starting at that entry are NaN
. Does it make any sense that the result has entry 0
set to NaN
, but entries 2
and 3
(and 1
in 0.15.0) set to 0.0
?
I would have thought that the values to be explicitly NaN
ed would be those determined by x.notnull().cumsum() < min_periods
. This would be consistent with the meaning of min_periods
in the rolling_*()
and expanding_*()
functions.
CC'ing @snth and @jaimefrio, in case they have opinions.