BUG: Inconsistencies in calculating second moments of a single value (original) (raw)

I noticed (in #7884) that ewmvar, ewmstd, ewmvol, ewmcov, rolling_var, and rolling_std return 0.0 for a single value (assuming min_periods=0); whereas Series.std, Series.var, ewmcorr, expanding_cov, expanding_corr, rolling_cov, and rolling_corr all return NaN for a single value. expanding_std and expanding_var produce Value Error: min_periods (2) must be <= window (1).

I think all of these should all return NaN for a single value. At any rate, I would expect greater consistency one way or the other.

Mildly related, when calculating the correlation of a constant series with itself, Series.corr(), expanding_corr, and rolling_corr return NaN, while ewmcorr sometimes returns NaN, sometimes 1 and sometimes -1, due to numerical accuracy issues. Actually, as shown in a separate comment below, rolling_corr also produces inconsistent results for a constant subseries following different prior values.

Inconsistencies in calculating second moments of a single point:

Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pandas import Series, ewmvar, ewmstd, ewmvol, ewmcov, rolling_var, rolling_std, ewmcorr, expanding_cov, expanding_corr,
 expanding_std, expanding_var, rolling_cov, rolling_corr
In [2]: s = Series([1.])

In [3]: ewmvar(s, halflife=2., min_periods=0)
Out[3]:
0    0
dtype: float64

In [4]: ewmstd(s, halflife=2., min_periods=0)
Out[4]:
0    0
dtype: float64

In [5]: ewmvol(s, halflife=2., min_periods=0)
Out[5]:
0    0
dtype: float64

In [6]: ewmcov(s, s, halflife=2., min_periods=0)
Out[6]:
0    0
dtype: float64

In [7]: rolling_var(s, window=2, min_periods=0)
Out[7]:
0    0
dtype: float64

In [8]: rolling_std(s, window=2, min_periods=0)
Out[8]:
0    0
dtype: float64

In [9]: s.std()
Out[9]: nan

In [10]: s.var()
Out[10]: nan

In [11]: ewmcorr(s, s, halflife=2., min_periods=0)
Out[11]:
0   NaN
dtype: float64

In [12]: expanding_cov(s, s, min_periods=0)
Out[12]:
0   NaN
dtype: float64

In [13]: expanding_corr(s, s, min_periods=0)
Out[13]:
0   NaN
dtype: float64

In [16]: rolling_cov(s, s, window=3, min_periods=0)
Out[16]:
0   NaN
dtype: float64

In [17]: rolling_corr(s, s, window=3, min_periods=0)
Out[17]:
0   NaN
dtype: float64


In [14]: expanding_std(s, min_periods=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-7320ee579e6a> in <module>()
----> 1 expanding_std(s, min_periods=0)

C:\Python34\lib\site-packages\pandas\stats\moments.py in f(arg, min_periods, freq, center, **kwargs)
    825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
--> 827                                center=center, **kwargs)
    828
    829     return f

C:\Python34\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, how, args, kwa
rgs, **kwds)
    345         result = np.apply_along_axis(calc, axis, values)
    346     else:
--> 347         result = calc(values)
    348
    349     rs = return_hook(result)

C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(x)
    339     arg = _conv_timerule(arg, freq, how)
    340     calc = lambda x: func(x, window, minp=minp, args=args, kwargs=kwargs,
--> 341                           **kwds)
    342     return_hook, values = _process_data_structure(arg)
    343     # actually calculate the moment. Faster way to do this?

C:\Python34\lib\site-packages\pandas\stats\moments.py in call_cython(arg, window, minp, args, kwargs, **kwds)
    823         def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
    824             minp = check_minp(minp, window)
--> 825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
    827                                center=center, **kwargs)

C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(*a, **kw)
    604                                how='median')
    605
--> 606 _ts_std = lambda *a, **kw: _zsqrt(algos.roll_var(*a, **kw))
    607 rolling_std = _rolling_func(_ts_std, 'Unbiased moving standard deviation.',
    608                             check_minp=_require_min_periods(1))

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos.roll_var (pandas\algos.c:28990)()

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos._check_minp (pandas\algos.c:16394)()

ValueError: min_periods (2) must be <= window (1)

In [15]: expanding_var(s, min_periods=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-28e2018aee68> in <module>()
----> 1 expanding_var(s, min_periods=0)

C:\Python34\lib\site-packages\pandas\stats\moments.py in f(arg, min_periods, freq, center, **kwargs)
    825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
--> 827                                center=center, **kwargs)
    828
    829     return f

C:\Python34\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, how, args, kwa
rgs, **kwds)
    345         result = np.apply_along_axis(calc, axis, values)
    346     else:
--> 347         result = calc(values)
    348
    349     rs = return_hook(result)

C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(x)
    339     arg = _conv_timerule(arg, freq, how)
    340     calc = lambda x: func(x, window, minp=minp, args=args, kwargs=kwargs,
--> 341                           **kwds)
    342     return_hook, values = _process_data_structure(arg)
    343     # actually calculate the moment. Faster way to do this?

C:\Python34\lib\site-packages\pandas\stats\moments.py in call_cython(arg, window, minp, args, kwargs, **kwds)
    823         def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
    824             minp = check_minp(minp, window)
--> 825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
    827                                center=center, **kwargs)

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos.roll_var (pandas\algos.c:28990)()

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos._check_minp (pandas\algos.c:16394)()

ValueError: min_periods (2) must be <= window (1)


In [20]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.2
numpy: 1.9.0b1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

Instability in ewmcorr of a constant series with itself:

In [1]: from pandas import Series, ewmcorr, expanding_corr, rolling_corr

In [2]: s = Series([5., 5., 5.])

In [3]: s.corr(s)
Out[3]: nan

In [4]: expanding_corr(s, s)
Out[4]:
0   NaN
1   NaN
2   NaN
dtype: float64

In [5]: rolling_corr(s, s, window=3)
Out[5]:
0   NaN
1   NaN
2   NaN
dtype: float64

In [6]: ewmcorr(s, s, halflife=3.)
Out[6]:
0   -1
1   -1
2   -1
dtype: float64

In [9]: ewmcorr(Series([3., 3., 3.]), halflife=3.)
Out[9]:
0   NaN
1     1
2     1
dtype: float64