BUG: ewma() doesn't adjust weights properly for missing values · Issue #7543 · pandas-dev/pandas (original) (raw)

ewma() simply ignores missing values, effectively calculating the exponentially weighted moving average on the compacted series without the missing values. I think this is incorrect, and that values should be weighted based on their absolute location.

In the code below, I reproduce the ewma() calculation using "wrong" weights, and then show what I believe the correct result should be using the "right" weights.

In [1]: from pandas import Series, ewma

In [2]: def simple_wma(x, w):
   ...:     return x.multiply(w).cumsum() / w.cumsum()
   ...:

In [3]: s = Series([0, None, 100])

In [4]: com = 2

In [5]: alpha = 1/(1+com)

In [6]: wrong_weights_adjust_false = Series([(1-alpha), None, alpha])

In [7]: wrong_weights_adjust_true = Series([(1-alpha), None, 1])

In [8]: right_weights_adjust_false = Series([(1-alpha)**2, None, alpha])

In [9]: right_weights_adjust_true = Series([(1-alpha)**2, None, 1])

In [10]: ewma(s, com=com, adjust=False)
Out[10]:
0     0.000000
1     0.000000
2    33.333333
dtype: float64

In [11]: simple_wma(s, wrong_weights_adjust_false)
Out[11]:
0     0.000000
1          NaN
2    33.333333
dtype: float64

In [12]: simple_wma(s, right_weights_adjust_false)
Out[12]:
0     0.000000
1          NaN
2    42.857143
dtype: float64

In [13]: ewma(s, com=com, adjust=True)
Out[13]:
0     0
1     0
2    60
dtype: float64

In [14]: simple_wma(s, wrong_weights_adjust_true)
Out[14]:
0     0
1   NaN
2    60
dtype: float64

In [15]: simple_wma(s, right_weights_adjust_true)
Out[15]:
0     0.000000
1          NaN
2    69.230769
dtype: float64

In [16]: s2 = Series([0, 100])

In [17]: ewma(s2, com=com, adjust=False)
Out[17]:
0     0.000000
1    33.333333
dtype: float64

In [18]: ewma(s2, com=com, adjust=True)
Out[18]:
0     0
1    60
dtype: float64