Issue 31270: Simplify documentation of itertools.zip_longest (original) (raw)

Created on 2017-08-24 16:03 by rami, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)

msg300788 - (view)

Author: Raphael Michel (rami) *

Date: 2017-08-24 16:03

The documentation given for itertools.zip_longest contains a "roughly equivalent" pure-python implementation of the function that is intended to help the user understand what zip_longest does on a functional level.

However, the given implementation is very complicated to read for newcomers and experienced Python programmers alike, as it uses a custom-defined exception for control flow handling, a nested function, a condition that always is true if any arguments are passed ("while iterators"), as well as two other non-trivial functions from itertools (chain and repeat).

For future reference, this is the currently given implementation:

def zip_longest(*args, **kwds):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    iterators = [iter(it) for it in args]

    while True:
        exhausted = 0
        values = []

        for it in iterators:
            try:
                values.append(next(it))
            except StopIteration:
                values.append(fillvalue)
                exhausted += 1

        if exhausted < len(args):
            yield tuple(values)
        else:
            break

This is way more complex than necessary to teach the concept of zip_longest. With this issue, I will submit a pull request with a new example implementation that seems to be the same level of "roughly equivalent" but is much easier to read, since it only uses two loops and now complicated flow

def zip_longest(*args, **kwds):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    iterators = [iter(it) for it in args]

    while True:
        exhausted = 0
        values = []

        for it in iterators:
            try:
                values.append(next(it))
            except StopIteration:
                values.append(fillvalue)
                exhausted += 1

        if exhausted < len(args):
            yield tuple(values)
        else:
            break

Looking at the C code of the actual implementation, I don't see that any one of the two implementations is obviously "more equivalent". I'm unsure about performance -- I haven't tried them on that but I don't think that's the point of this learning implementation.

I ran all tests from Lib/test/test_itertools.py against both the old and the new implementation. The new implementation fails at 3 tests, while the old implementation failed at four. Two of the remaining failures are related to TypeErrors not being thrown on invalid input, one of them is related to pickling the resulting object. I believe all three of them are fine to ignore in this sample, as it is not relevant to the documentation purpose.

Therefore, I believe the documentation should be changed like suggested. I'd be happy for any feedback or further ideas to improve its readability!

msg300790 - (view)

Author: Raphael Michel (rami) *

Date: 2017-08-24 16:30

I just noticed that in my post I accidentally pasted MY implementation twice instead of the old one, sorry for that. Here's the old one for quick comparison:

class ZipExhausted(Exception): pass

def zip_longest(*args, **kwds): # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D- fillvalue = kwds.get('fillvalue') counter = len(args) - 1 def sentinel(): nonlocal counter if not counter: raise ZipExhausted counter -= 1 yield fillvalue fillers = repeat(fillvalue) iterators = [chain(it, sentinel(), fillers) for it in args] try: while iterators: yield tuple(map(next, iterators)) except ZipExhausted: pass

msg300799 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2017-08-24 19:32

Thanks for wanting to improve the documentation.

Raymond will address this definitively, but unless I'm mistaken part of the purpose of the examples is to show how the various itertools can be used. If that is true, then in the context of the overall itertools documentation I think the current example has more teaching value than your suggested revision.

msg300830 - (view)

Author: Raphael Michel (rami) *

Date: 2017-08-25 09:33

Well, I could think of a way to still use repeat() here that also is pretty clean except for the fact that it fails if all inputs to zip_longest are repeat() iterators themselves (which would here lead to an empty iterator while it would otherwise lead to an infinite one):

def zip_longest(*args, **kwds):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    iterators = [iter(it) for it in args]

    while True:
        values = []

        for i, it in enumerate(iterators):
            try:
                values.append(next(it))
            except StopIteration:
                values.append(fillvalue)
                iterators[i] = repeat(fillvalue)

        if all(isinstance(it, repeat) for it in iterators):
            break
        else:
            yield tuple(values)

Keeping chain() in use here just for the sake of using it is not worth it, I believe.

msg301635 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2017-09-07 21:51

New changeset 3147b0422cbeb98065666ccf95ab6845ac800fd4 by Raymond Hettinger in branch 'master': bpo-31270: Modification of Pr 3200 (#3427) https://github.com/python/cpython/commit/3147b0422cbeb98065666ccf95ab6845ac800fd4

History

Date

User

Action

Args

2022-04-11 14:58:51

admin

set

github: 75453

2017-09-07 21:51:11

rhettinger

set

messages: +

2017-09-07 21:03:37

rhettinger

set

status: open -> closed
resolution: fixed
stage: patch review -> resolved

2017-09-07 19:46:43

rhettinger

set

pull_requests: + <pull%5Frequest3424>

2017-08-25 09:33:42

rami

set

messages: +

2017-08-24 19:32:34

r.david.murray

set

nosy: + r.david.murray
messages: +

2017-08-24 18:01:39

rhettinger

set

assignee: docs@python -> rhettinger

nosy: + rhettinger

2017-08-24 16:30:23

rami

set

messages: +

2017-08-24 16:19:31

Mariatta

set

stage: patch review
versions: + Python 3.6

2017-08-24 16:06:32

rami

set

pull_requests: + <pull%5Frequest3239>

2017-08-24 16:03:33

rami

create