[Python-Dev] PEP 450 adding statistics module (original) (raw)

Oscar Benjamin oscar.j.benjamin at gmail.com
Fri Aug 16 09:47:55 CEST 2013


On 15 August 2013 14:08, Steven D'Aprano <steve at pearwood.info> wrote:

- The API doesn't really feel very Pythonic to me. For example, we write: mystring.rjust(width) dict.items() rather than mystring.justify(width, "right") or dict.iterate("items"). So I think individual methods is a better API, and one which is more familiar to most Python users. The only innovation (if that's what it is) is to have median a callable object.

Although you're talking about median() above I think that this same reasoning applies to the mode() signature. In the reference implementation it has the signature:

def mode(data, max_modes=1): ...

The behaviour is that with the default max_modes=1 it will return the unique mode or raise an error if there isn't a unique mode:

mode([1, 2, 3, 3]) 3 mode([]) StatisticsError: no mode mode([1, 1, 2, 3, 3]) AssertionError

You can use the max_modes parameter to specify that more than one mode is acceptable and setting max_modes to 0 or None returns all modes no matter how many. In these cases mode() returns a list:

mode([1, 1, 2, 3, 3], maxmodes=2) [1, 3] mode([1, 1, 2, 3, 3], maxmodes=None) [1, 3]

I can't think of a situation where 1 or 2 modes are acceptable but 3 is not. The only forms I can imagine using are mode(data) to get the unique mode if it exists and mode(data, max_modes=None) to get the set of all modes. But for that usage it would be better to have a boolean flag and then either way you're at the point where it would normally become two functions.

Also I dislike changing the return type based on special numeric values:

mode([1, 2, 3, 3], maxmodes=0) [3] mode([1, 2, 3, 3], maxmodes=1) 3 mode([1, 2, 3, 3], maxmodes=2) [3] mode([1, 2, 3, 3], maxmodes=3) [3]

My preference would be to have two functions, one called e.g. modes() and one called mode(). modes() always returns a list of the most frequent values no matter how many. mode() returns a unique mode if there is one or raises an error. I think that that would be simpler to document and easier to learn and use. If the user is for whatever reason happy with 1 or 2 modes but not 3 then they can call modes() and check for themselves.

Also I think that:

modes([]) [] but I expect others to disagree.

Oscar



More information about the Python-Dev mailing list