Issue 31270: Simplify documentation of itertools.zip_longest (original) (raw)
Created on 2017-08-24 16:03 by rami, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (5)
Author: Raphael Michel (rami) *
Date: 2017-08-24 16:03
The documentation given for itertools.zip_longest contains a "roughly equivalent" pure-python implementation of the function that is intended to help the user understand what zip_longest does on a functional level.
However, the given implementation is very complicated to read for newcomers and experienced Python programmers alike, as it uses a custom-defined exception for control flow handling, a nested function, a condition that always is true if any arguments are passed ("while iterators"), as well as two other non-trivial functions from itertools (chain and repeat).
For future reference, this is the currently given implementation:
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
iterators = [iter(it) for it in args]
while True:
exhausted = 0
values = []
for it in iterators:
try:
values.append(next(it))
except StopIteration:
values.append(fillvalue)
exhausted += 1
if exhausted < len(args):
yield tuple(values)
else:
break
This is way more complex than necessary to teach the concept of zip_longest. With this issue, I will submit a pull request with a new example implementation that seems to be the same level of "roughly equivalent" but is much easier to read, since it only uses two loops and now complicated flow
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
iterators = [iter(it) for it in args]
while True:
exhausted = 0
values = []
for it in iterators:
try:
values.append(next(it))
except StopIteration:
values.append(fillvalue)
exhausted += 1
if exhausted < len(args):
yield tuple(values)
else:
break
Looking at the C code of the actual implementation, I don't see that any one of the two implementations is obviously "more equivalent". I'm unsure about performance -- I haven't tried them on that but I don't think that's the point of this learning implementation.
I ran all tests from Lib/test/test_itertools.py against both the old and the new implementation. The new implementation fails at 3 tests, while the old implementation failed at four. Two of the remaining failures are related to TypeErrors not being thrown on invalid input, one of them is related to pickling the resulting object. I believe all three of them are fine to ignore in this sample, as it is not relevant to the documentation purpose.
Therefore, I believe the documentation should be changed like suggested. I'd be happy for any feedback or further ideas to improve its readability!
Author: Raphael Michel (rami) *
Date: 2017-08-24 16:30
I just noticed that in my post I accidentally pasted MY implementation twice instead of the old one, sorry for that. Here's the old one for quick comparison:
class ZipExhausted(Exception): pass
def zip_longest(*args, **kwds): # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D- fillvalue = kwds.get('fillvalue') counter = len(args) - 1 def sentinel(): nonlocal counter if not counter: raise ZipExhausted counter -= 1 yield fillvalue fillers = repeat(fillvalue) iterators = [chain(it, sentinel(), fillers) for it in args] try: while iterators: yield tuple(map(next, iterators)) except ZipExhausted: pass
Author: R. David Murray (r.david.murray) *
Date: 2017-08-24 19:32
Thanks for wanting to improve the documentation.
Raymond will address this definitively, but unless I'm mistaken part of the purpose of the examples is to show how the various itertools can be used. If that is true, then in the context of the overall itertools documentation I think the current example has more teaching value than your suggested revision.
Author: Raphael Michel (rami) *
Date: 2017-08-25 09:33
Well, I could think of a way to still use repeat() here that also is pretty clean except for the fact that it fails if all inputs to zip_longest are repeat() iterators themselves (which would here lead to an empty iterator while it would otherwise lead to an infinite one):
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
iterators = [iter(it) for it in args]
while True:
values = []
for i, it in enumerate(iterators):
try:
values.append(next(it))
except StopIteration:
values.append(fillvalue)
iterators[i] = repeat(fillvalue)
if all(isinstance(it, repeat) for it in iterators):
break
else:
yield tuple(values)
Keeping chain() in use here just for the sake of using it is not worth it, I believe.
Author: Raymond Hettinger (rhettinger) *
Date: 2017-09-07 21:51
New changeset 3147b0422cbeb98065666ccf95ab6845ac800fd4 by Raymond Hettinger in branch 'master': bpo-31270: Modification of Pr 3200 (#3427) https://github.com/python/cpython/commit/3147b0422cbeb98065666ccf95ab6845ac800fd4
History
Date
User
Action
Args
2022-04-11 14:58:51
admin
set
github: 75453
2017-09-07 21:51:11
rhettinger
set
messages: +
2017-09-07 21:03:37
rhettinger
set
status: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-09-07 19:46:43
rhettinger
set
pull_requests: + <pull%5Frequest3424>
2017-08-25 09:33:42
rami
set
messages: +
2017-08-24 19:32:34
r.david.murray
set
nosy: + r.david.murray
messages: +
2017-08-24 18:01:39
rhettinger
set
assignee: docs@python -> rhettinger
nosy: + rhettinger
2017-08-24 16:30:23
rami
set
messages: +
2017-08-24 16:19:31
Mariatta
set
stage: patch review
versions: + Python 3.6
2017-08-24 16:06:32
rami
set
pull_requests: + <pull%5Frequest3239>
2017-08-24 16:03:33
rami
create