Improve accuracy of builtin sum() for float inputs (original) (raw)

Currently sum() makes no efforts to improve accuracy over a simple running total. We do have math.fsum() that makes extreme efforts to be almost perfect; however, that function isn't well known, it runs 10x slower than regular sum(), and it always coerces to a float.

I suggest switching the builtin sum() handling of float inputs to Arnold Neumaier's improved variation of compensated summation. Per his paper, this algorithm has excellent error bounds (though not as perfect as fsum():

|s - š| ≤ ε|s| + ε²(3/4n² + n)·Σ|aᵢ|               (IV,12)
|s - š| ≤ ε|s| + ε²(1/4n³ + 5/2n² + n)·Max|aᵢ|     (IV,13)

The compensation tracking runs in parallel to the main accumulation. And except for pathological cases, the branch is predictable, making the test essentially free. Thanks to the magic of instruction level parallelism and branch prediction, this improvement has zero cost on my Apple M1 build. Timed with:

./python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range(10_000)]' 'sum(d)'

N.B. Numpy switched from a simple running total to partial pairwise summation. That isn't as accurate as what is being proposed here, but it made more sense for them because the extra work of Neumaier summation isn't masked by the overhead of fetching values from an iterator as we do here. Also with an iterator, we can't do pairwise summation without using auxiliary memory.

Improve accuracy of builtin sum() for float inputs (original) (raw)

Linked PRs