msg192805 - (view) |
Author: Marco Buttu (marco.buttu) * |
Date: 2013-07-10 13:11 |
The documentaion of sum(): Returns the sum of a sequence of numbers (NOT strings) plus the value of parameter 'start' (which defaults to 0). When the sequence is empty, returns start. A. According to the PEP-8 it should be: "Return the sum...", and "When the sequence is empty, return start.", like the other docs. For instance: >>> print(len.__doc__) len(object) -> integer Return the number of items of a sequence or mapping. B. When the second argument is a tuple or a list, you can add sequences of sequences: >>> sum([['a', 'b', 'c'], [4]], []) ['a', 'b', 'c', 4] >>> sum(((1, 2, 3), (1,)), (1,)) (1, 1, 2, 3, 1) C. sum() takes not just sequences: >>> sum({1: 'one', 2: 'two'}) 3 Maybe it is not a good idea to give a complete description of sum() in the docstring, but perhaps something "good enough". In any case, I think the lack of the PEP-8 recommendation should be fixed. |
|
|
msg192808 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-07-10 13:51 |
Perhaps we could add something like "Also works, though possibly inefficiently, on any iterable whose elements support addition". The biggest part of the sphinx docs for this are about what to use instead, and that doesn't really seem appropriate for a docstring. So it may indeed be best to just not mention it in the docstring. |
|
|
msg192811 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2013-07-10 14:04 |
There's an annoyingly long discussion about sum() on python-ideas. IMHO the documentation should mention, as it does now, that sum is intended to be used with a sequence of numbers even it does work with most objects that support the + operator (such as by implementing __add__). In particular, using sum with a sequence of lists or tuples is extremely inefficient. The fact that sum({1:'a', 2: 'b'}) works is a side effect of the how python works with sequences and IMHO doesn't have to be documented in every function that accepts a sequence as an argument. |
|
|
msg192814 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-07-10 14:44 |
OK, so your vote is to leave the doc string alone (except for the PEP8 changes), right? |
|
|
msg192815 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2013-07-10 14:50 |
Yes, the docstring isn't meant to be exhaustive documentation. The manual is more exhaustive and, as you noted, already contains links to alternatives. |
|
|
msg192816 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-07-10 15:13 |
Currently sum() is intended to work with numbers, explicitly forbids strings (as noted in the docstring), but also works with other types (even though it's inefficient). If we want to document this, a possible wording might be: Returns the sum of a sequence of numbers plus the value of parameter 'start' (which defaults to 0). When the sequence is empty, returns start. Using sum() with a sequence of strings is not allowed, and might be inefficient with sequences of other types. We should also consider that the implementation/behavior might change in future, but we can always update the docstring again. +1 on the PEP 8 changes. |
|
|
msg192818 - (view) |
Author: Marco Buttu (marco.buttu) * |
Date: 2013-07-10 15:21 |
By reading the Ronald's comment, I realized it is better to keep it simple, so I agree with him. The "extremely inefficient" reason seems to be less important (Python 3.3): $ python -m timeit -s "a=['a']*10000; b=['b']*10000; a+b" 100000000 loops, best of 3: 0.00831 usec per loop $ python -m timeit -s "a=['a']*10000; b=['b']*10000; sum([a, b], [])" 100000000 loops, best of 3: 0.0087 usec per loop |
|
|
msg192819 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2013-07-10 15:33 |
Appending a sequence of lists with sum is inefficient because it (currently) does a lot of copying, and that gets noticable when you sum a larger number of lists Note how using sum for add 200 lists is more than twice as long as adding 100 lists: ronald@gondolin[0]$ python -m timeit -s "lists=[['a']*100 for i in range(100)]" "sum(lists, [])" 100 loops, best of 3: 2.04 msec per loop ronald@gondolin[0]$ python -m timeit -s "lists=[['a']*100 for i in range(200)]" "sum(lists, [])" 100 loops, best of 3: 9.2 msec per loop Also note how using itertools.chain is both a lot faster and behaves better: ronald@gondolin[0]$ python -m timeit -s "import itertools; lists=[['a']*100 for i in range(100)]" "list(itertools.chain.from_iterable(lists))" 10000 loops, best of 3: 165 usec per loop ronald@gondolin[0]$ python -m timeit -s "import itertools; lists=[['a']*100 for i in range(100)]" "list(itertools.chain.from_iterable(lists))" 10000 loops, best of 3: 155 usec per loop (I used python2.7 for this, the same behavior can be seem with python 3). See also #18305, which proposed a small change to how sum works which would fix the performance problems for summing a sequence of lists (before going too far and proposing to add special-case tuples and string) |
|
|
msg192831 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-07-10 20:23 |
New changeset 4b3b87719e2c by R David Murray in branch '3.3': #18424: PEP8ify the tense of the sum docstring. http://hg.python.org/cpython/rev/4b3b87719e2c New changeset 38b42ffdf86b by R David Murray in branch 'default': Merge: #18424: PEP8ify the tense of the sum docstring. http://hg.python.org/cpython/rev/38b42ffdf86b New changeset c5f5b5e89a94 by R David Murray in branch '2.7': #18424: PEP8ify the tense of the sum docstring. http://hg.python.org/cpython/rev/c5f5b5e89a94 |
|
|
msg192832 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-07-10 20:24 |
Ok, pep8 changes committed. |
|
|