API: meaning of min_periods for ewm*() functions? · Issue #7977 · pandas-dev/pandas (original) (raw)

The interpretation of min_periods in the ewm*() functions seems rather odd to me. For example (in 0.14.1):

In [19]: x
Out[19]:
0     0
1   NaN
2   NaN
3   NaN
4     4
5   NaN
6     6
dtype: float64

In [20]: ewma(x, com=3., min_periods=2)
Out[20]:
0         NaN
1         NaN
2    0.000000
3    0.000000
4    2.285714
5    2.285714
6    3.891892
dtype: float64

The way it works, is it finds the first non-NaN value (0 in the example above) and then makes sure that the min_periods entries (min_periods-1 in 0.15.0, per #7898) in the result starting at that entry are NaN. Does it make any sense that the result has entry 0 set to NaN, but entries 2 and 3 (and 1 in 0.15.0) set to 0.0?

I would have thought that the values to be explicitly NaNed would be those determined by x.notnull().cumsum() < min_periods. This would be consistent with the meaning of min_periods in the rolling_*() and expanding_*() functions.

CC'ing @snth and @jaimefrio, in case they have opinions.