[Python-Dev] PEP 450 adding statistics module (original) (raw)

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Sep 9 11:00:24 CEST 2013


On 9 September 2013 04:16, Guido van Rossum <guido at python.org> wrote:

Yeah, so this and Steven's review of various other APIs suggests that the field of statistics hasn't really reached the object-oriented age (or perhaps the OO view isn't suitable for the field), and people really think of their data as a matrix of some sort. We should respect that. Now, if this was NumPy, it would still make sense to require a single argument, to be interpreted in the usual fashion. So I'm using that as a kind of leverage to still recommend taking a list of pairs instead of a pair of lists. Also, it's quite likely that at least some of the users of the new statistics module will be more familiar with OO programming (e.g. the Python DB API , PEP 249) than they are with other statistics packages.

I'm not sure if I understand what you mean by this. Numpy has built everything on top of a core ndarray class whose methods make the issues about multivariate stats APIs trivial. The transpose of an array A is simply the attribute A.T which is both convenient and cheap since it's just an alternate view on the underlying buffer.

Also numpy provides record arrays that enable you to use names instead of numeric indices:

import numpy as np dt = np.dtype([('Year', int), ('Arizona', float), ('Dakota', float)]) a = np.array([(2001, 123., 456.), (2002, 234., 345), (2003, 345., 567)], dt) a array([(2001, 123.0, 456.0), (2002, 234.0, 345.0), (2003, 345.0, 567.0)], dtype=[('Year', '<i4'), ('Arizona', '<f8'), ('Dakota', '<f8')]) a['Year']_ array([2001, 2002, 2003]) a['Arizona'] array([ 123., 234., 345.]) np.corrcoef(a['Arizona'], a['Dakota']) array([[ 1. , 0.5], [ 0.5, 1. ]]) _included = a[a['Year'] > 2001] included array([(2002, 234.0, 345.0), (2003, 345.0, 567.0)], dtype=[('Year', '<i4'), ('Arizona', '<f8'), ('Dakota', '<f8')]) np.corrcoef(included['Arizona'], included['Dakota']) array([[ 1., 1.], [ 1., 1.]])

So perhaps the statistics module could have a similar NameTupleArray type that can be easily loaded and saved from a csv file and makes it easy to put your data in whatever form is required.

Oscar



More information about the Python-Dev mailing list