[Python-Dev] Accumulation module (original) (raw)
Raymond Hettinger python at rcn.com
Wed Jan 14 07:40:43 EST 2004
- Previous message: [Python-Dev] Re: Accumulation module
- Next message: [Python-Dev] Accumulation module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> * Is there a way to compute the standard deviation without multiple > passes over the data (one to compute the mean and a second to sum the > squares of the differences from the mean?
[Tim]
The textbook formula does it in one pass, by accumulating running totals of the elements and of their squares. It's numerically atrocious, though, and should never be used (it can even end up delivering a negative result!). Knuth gives a much better recurrence formula for this, doing one-pass computation of variance in a numerically stable way.
I found it.
There is a suggestion that the formula would be better off if the recurrence sums are accumulated along with a cumulative error term that tracks the loss of significance when adding a small adjustment to a larger sum. The textbook exercise indicates that that technique is the most useful in the presence of crummy rounding. Do you think there is any value in using the error term tracking?
def stddev(data): # Knuth -- Seminumerical Algorithms 4.2.2 p.216 it = iter(data) m = float(it.next()) # new running mean s = 0.0 # running sum((x-mean)**2) k = 1 # number of items dm = ds = 0 # carried forward error term for x in it: k += 1
adjm = (x-m)/k - dm
newm = m + adjm
dm = (newm - m) - adjm
adjs = (x-m)*(x-newm) - ds
news = s + adjs
ds = (news - s) - adjs
s = news
m = newm
return (s / (k-1)) ** 0.5 # sample standard deviation
Without the error term tracking, the inner loop simplifies to:
k += 1
newm = m+(x-m)/k
s += (x-m)*(x-newm)
m = newm
Raymond Hettinger
- Previous message: [Python-Dev] Re: Accumulation module
- Next message: [Python-Dev] Accumulation module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]