Rolling skewness and kurtosis fail on a sample of all equal values · Issue #5749 · pandas-dev/pandas (original) (raw)
For a sample of data like this:
Both of these throw an exception (during an attempt to divide by zero):
pd.rolling_skew(d, window=25)
pd.rolling_kurt(d, window=25)
The issue is in algos.pyx. There are no checks for what amounts to zero variance in the data. If one value occurs more times in a row than than the size of the window, the entire rolling computation fails, rather than just returning NaN for that one period (which is what I'd expect). For reference, scipy gives a kurtosis of -3 and a skewness of 0 (plus a warning) for this situation, which is not what I'd expect (since the higher moments are all zero, implying a division by zero).
>>> from scipy import stats
>>> stats.kurtosis([1,1,1,1,1,1,1])
-3.0
>>> stats.skew([1,1,1,1,1,1,1])
/usr/lib/python2.7/dist-packages/scipy/stats/stats.py:1067: RuntimeWarning: invalid value encountered in double_scalars
vals = np.where(zero, 0, m3 / m2**1.5)
0.0
Below is the approach I was taking to weed out any possible divide by zero issues. I'll submit a proper pull request tomorrow, in the meantime this is here in case I can get any feedback, preferably on whether these added conditions are enough (I think the kurtosis could still break) and how to add some tests for both of these.
diff --git a/pandas/algos.pyx b/pandas/algos.pyx
index 08ec707..78b619f 100644
--- a/pandas/algos.pyx
+++ b/pandas/algos.pyx
@@ -1160,7 +1160,7 @@ def roll_skew(ndarray[double_t] input, int win, int minp):
nobs -= 1
- if nobs >= minp:
+ if nobs >= minp and not (x == 0 and xx == 0) and nobs != 2:
A = x / nobs
B = xx / nobs - A * A
C = xxx / nobs - A * A * A - 3 * A * B
@@ -1227,7 +1227,7 @@ def roll_kurt(ndarray[double_t] input,
nobs -= 1
- if nobs >= minp:
+ if nobs >= minp and not (x == 0 and xx == 0) and nobs != 2:
A = x / nobs
R = A * A
B = xx / nobs - R