[Python-Dev] Fwd: [Python-ideas] stats module Was: minmax() function ... (original) (raw)

geremy condra debatem1 at gmail.com
Sat Oct 16 02:05:26 CEST 2010

Previous message: [Python-Dev] Fwd: [Python-ideas] stats module Was: minmax() function ...
Next message: [Python-Dev] Fwd: [Python-ideas] stats module Was: minmax() function ...
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Oct 15, 2010 at 1:00 PM, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:

Hello guys. If you don't mind, I would like to hijack your thread :-)

ISTM, that the minmax() idea is really just an optimization request. A single-pass minmax() is easily coded in simple, pure-python, so really the discussion is about how to remove the loop overhead (there isn't much you can do about the cost of the two compares which is where most of the time would be spent anyway). My suggestion is to aim higher. There is no reason a single pass couldn't also return min/max/len/sum and perhaps even other summary statistics like sum(x**2) so that you can compute standard deviation and variance.

+1 from me. Here's a normal cdf and chi squared cdf approximation I use for randomness testing. They may need to refined for inclusion, but you're welcome to use them if you'd like.

from math import sqrt, erf

def normal_cdf(x, mu=0, sigma=1): """Approximates the normal cumulative distribution""" return (1/2) * (1 + erf((x+mu)/(sigma*sqrt(2))))

def chi_squared_cdf(x, k): """Approximates the cumulative chi-squared statistic with k degrees of freedom.""" numerator = 1 - (2/(9*k)) - ((x/k)**(1/3)) denominator = (1/3) * sqrt(2/k) return normal_cdf(numerator/denominator)

A few years ago, Guido and other python devvers supported a proposal I made to create a stats module, but I didn't have time to develop it. The basic idea was that python's batteries should include most of the functionality available on advanced student calculators. Another idea behind it was that we could invisibility do-the-right-thing under the hood to help users avoid numerical problems (i.e. math.fsum(s)/len(s) is a more accurate way to compute an average because it doesn't lose precision when building-up the intermediate sums).

Can you give some other examples? Sage does some of this and I frequently find it annoying, actually, but I'm not sure if you're referring to the same things there.

Geremy Condra

Previous message: [Python-Dev] Fwd: [Python-ideas] stats module Was: minmax() function ...
Next message: [Python-Dev] Fwd: [Python-ideas] stats module Was: minmax() function ...
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list