[Python-Dev] Accumulation module (original) (raw)

Raymond Hettinger python at rcn.com
Wed Jan 14 03:24:41 EST 2004


> * What to call the module

[Aahz]

stats

There is already a stat module. Any chance of confusion?

The other naming issue is that some of the functions have non-statistical uses: product() is general purpose; nlargest() and nsmallest() will accept any datatype (though most of the use cases are with numbers). Are there other general purpose (non-statistical) accumulation/reduction formulas that go here?

> * What else should be in it?

[Matthias Klose]

you may want to have a look at http://www.nmr.mgh.harvard.edu/NeuralSystemsGroup/gary/python.html

Ages ago, when the idea for this module first arose, a certain bot recommended strongly against including any but the most basic statistical functions (muttering something about the near impossibility of doing it well in either python or portable C and something about not wanting to maintain anything that wasn't dirt simple). His words would have of course fallen on deaf ears, but a certain dictatorial type had just finished teaching advanced programming skills to people who couldn't operate a high school calculator. Sooooo, no Kurtosis for you, no gamma function for me!

It's possible that chi-square or regression could slip in, but it would require considerable cheerleading and a rare planetary alignment.

> * What else should be in it?

[Jeremy]

median()

And a function like bins() or histogram() that accumulates the values in buckets of some size.

That sounds beginner simple and reasonably useful though it would have been nice if all the reduction formulas could work with one-pass and never need to manifest the whole dataset in memory.

> Note, heapq is used for both (I use > operator.neg to swap between largest and smallest).

[Bernhard Herzog]

Does that mean nlargest/nsmallest only work for numbers? I think it might be useful for e.g. strings too.

The plan was to make them work with anything defining lt; however, if it is coded in python and uses heapq, I don't see a straight-forward way around using operator.neg without wrapping everything in some sense reverser object.

Raymond Hettinger



More information about the Python-Dev mailing list