[Python-Dev] iterators (original) (raw)

Guido van Rossum guido@python.org
Fri, 18 Aug 2000 16:57:15 -0400

Previous message: [Python-Dev] Re: indexing, indices(), irange(), list.items() (was RE: [Python-Dev] Lockstep iteration - eureka!)
Next message: [Python-Dev] iterators
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Paul Prescod wrote:

I don't think of iterators as indexing in terms of numbers. Otherwise I could do this:

>>> a={0:"zero",1:"one",2:"two",3:"three"} >>> for i in a: ... print i ... So from a Python user's point of view, for-looping has nothing to do with integers. From a Python class/module creator's point of view it does have to do with integers. I wouldn't be either surprised nor disappointed if that changed one day.

Bingo!

I've long had an idea for generalizing 'for' loops using iterators. This is more a Python 3000 thing, but I'll explain it here anyway because I think it's relevant. Perhaps this should become a PEP? (Maybe we should have a series of PEPs with numbers in the 3000 range for Py3k ideas?)

The statement

for in :

should translate into this kind of pseudo-code:

variant 1

__temp = .newiterator() while 1: try: = __temp.next() except ExhaustedIterator: break

or perhaps (to avoid the relatively expensive exception handling):

variant 2

__temp = .newiterator() while 1: __flag, <variable = __temp.next() if not __flag: break

In variant 1, the next() method returns the next object or raises ExhaustedIterator. In variant 2, the next() method returns a tuple (, ) where is 1 to indicate that is valid or 0 to indicate that there are no more items available. I'm not crazy about the exception, but I'm even less crazy about the more complex next() return value (careful observers may have noticed that I'm rarely crazy about flag variables :-).

Another argument for variant 1 is that variant 2 changes what is after the loop is exhausted, compared to current practice: currently, it keeps the last valid value assigned to it. Most likely, the next() method returns None when the sequence is exhausted. It doesn't make a lot of sense to require it to return the last item of the sequence -- there may not be a last item, if the sequence is empty, and not all sequences make it convenient to keep hanging on to the last item in the iterator, so it's best to specify that next() returns (0, None) when the sequence is exhausted.

(It would be tempting to suggeste a variant 1a where instead of raising an exception, next() returns None when the sequence is exhausted, but this won't fly: you couldn't iterate over a list containing some items that are None.)

Side note: I believe that the iterator approach could actually speed up iteration over lists compared to today's code. This is because currently the interation index is a Python integer object that is stored on the stack. This means an integer add with overflow check, allocation, and deallocation on each iteration! But the iterator for lists (and other basic sequences) could easily store the index as a C int! (As long as the sequence's length is stored in an int, the index can't overflow.)

[Warning: thinking aloud ahead!]

Once we have the concept of iterators, we could support explicit use of them as well. E.g. we could use a variant of the for statement to iterate over an existing iterator:

for over :

which would (assuming variant 1 above) translate to:

while 1: try: = .next() except ExhaustedIterator: break

This could be used in situations where you have a loop iterating over the first half of a sequence and a second loop that iterates over the remaining items:

it = something.newiterator() for x over it: if time_to_start_second_loop(): break do_something() for x over it: do_something_else()

Note that the second for loop doesn't reset the iterator -- it just picks up where the first one left off! (Detail: the x that caused the break in the first loop doesn't get dealt with in the second loop.)

I like the iterator concept because it allows us to do things lazily. There are lots of other possibilities for iterators. E.g. mappings could have several iterator variants to loop over keys, values, or both, in sorted or hash-table order. Sequences could have an iterator for traversing them backwards, and a few other ones for iterating over their index set (cf. indices()) and over (index, value) tuples (cf. irange()). Files could be their own iterator where the iterator is almost the same as readline() except it raises ExhaustedIterator at EOF instead of returning "". A tree datastructure class could have an associated iterator class that maintains a "finger" into the tree.

Hm, perhaps iterators could be their own iterator? Then if 'it' were an iterator, it.newiterator() would return a reference to itself (not a copy). Then we wouldn't even need the 'over' alternative syntax. Maybe the method should be called iterator() then, not newiterator(), to avoid suggesting anything about the newness of the returned iterator.

Other ideas:

Iterators could have a backup() method that moves the index back (or raises an exception if this feature is not supported, e.g. when reading data from a pipe).
Iterators over certain sequences might support operations on the underlying sequence at the current position of the iterator, so that you could iterate over a sequence and occasionally insert or delete an item (or a slice).

Of course iterators also connect to generators. The basic list iterator doesn't need coroutines or anything, it can be done as follows:

class Iterator: def init(self, seq): self.seq = seq self.ind = 0 def next(self): if self.ind >= len(self.seq): raise ExhaustedIterator val = self.seq[self.ind] self.ind += 1 return val

so that .iterator() could just return Iterator() -- at least conceptually.

But for other data structures the amount of state needed might be cumbersome. E.g. a tree iterator needs to maintain a stack, and it's much easier to code that using a recursive Icon-style generator than by using an explicit stack. On the other hand, I remember reading an article a while ago (in Dr. Dobbs?) by someone who argued (in a C++ context) that such recursive solutions are very inefficient, and that an explicit stack (1) is really not that hard to code, and (2) gives much more control over the memory and time consumption of the code. On the third hand, some forms of iteration really are expressed much more clearly using recursion. On the fourth hand, I disagree with Matthias ("Dr. Scheme") Felleisen about recursion as the root of all iteration. Anyway, I believe that iterators (as explained above) can be useful even if we don't have generators (in the Icon sense, which I believe means coroutine-style).

--Guido

Previous message: [Python-Dev] Re: indexing, indices(), irange(), list.items() (was RE: [Python-Dev] Lockstep iteration - eureka!)
Next message: [Python-Dev] iterators
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]