API: meaning of min_periods for ewm*() functions? (original) (raw)
The interpretation of min_periods in the ewm*() functions seems rather odd to me. For example (in 0.14.1):
In [19]: x
Out[19]:
0 0
1 NaN
2 NaN
3 NaN
4 4
5 NaN
6 6
dtype: float64
In [20]: ewma(x, com=3., min_periods=2)
Out[20]:
0 NaN
1 NaN
2 0.000000
3 0.000000
4 2.285714
5 2.285714
6 3.891892
dtype: float64
The way it works, is it finds the first non-NaN value (0 in the example above) and then makes sure that the min_periods entries (min_periods-1 in 0.15.0, per #7898) in the result starting at that entry are NaN. Does it make any sense that the result has entry 0 set to NaN, but entries 2 and 3 (and 1 in 0.15.0) set to 0.0?
I would have thought that the values to be explicitly NaNed would be those determined by x.notnull().cumsum() < min_periods. This would be consistent with the meaning of min_periods in the rolling_*() and expanding_*() functions.
CC'ing @snth and @jaimefrio, in case they have opinions.