Cython function group_var_float64 may return small negative values · Issue #10448 · pandas-dev/pandas (original) (raw)

For certain well-chosen inputs, group_var_float64 may return small negative values due to roundoff error. This then interferes with e.g. computing aggregate standard deviations:

import pandas as pd import numpy as np df = pd.DataFrame({'a': (0, 0, 0), 'b': 0.832845131556193 * np.ones(3)}) df.groupby('a').std() b a 0 NaN

To see the cause of this in isolation, consider

import numpy as np from pandas.algos import group_var_float64

out = np.array([[0.0]], dtype=np.float64) counts = np.array([0]) values = 0.832845131556193 * np.ones((3, 1), dtype=np.float64) labels = np.zeros(3, dtype=np.int)

group_var_float64(out, counts, values, labels)

print out # This prints [[ -1.48029737e-16]] on my machine.

The fix for this should be easy (round up negative values to zero). I can provide a fix (+ tests) if needed.

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 14.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: nl_BE.UTF-8

pandas: 0.16.1
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.1.0
sphinx: 1.3.1
patsy: None
dateutil: 2.4.2
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: None