Issue 11889: 'enumerate' 'start' parameter documentation is confusing (original) (raw)

""" A point of confusion using the builtin function 'enumerate' and enlightenment for those who, like me, have been confused.

Note, this confusion was discussed at length at

http://bugs.python.org/issue2831

prior to the 'start' parameter being added to 'enumerate'. The confusion discussed herein was forseen in that discussion, and ultimately discounted. There remains, IMO, an issue with the clarity of the documentation that needs to be addressed. That is, the closed issue at

http://bugs.python.org/issue8635

concerning the 'enumerate' docstring does not address the confusion that prompted this posting.

Consider:

x=['a','b','c','d','e'] y=['f','g','h','i','j'] print 0,y[0] for i,c in enumerate(y,1): print i,c if c=='g': print x[i], 'y[%i]=g' % (i) continue print x[i]

This code produces the following unexpected output, using python 2.7, which is apparently the correct behavior (see commentary below). This example is an abstract simplification of a program defect encountered in practice:

0 f 1 f b 2 g c y[2]=g 3 h d 4 i e 5 j

Traceback (most recent call last): File "Untitled", line 9 print x[i] IndexError: list index out of range

Help on 'enumerate' yields:

help(enumerate) Help on class enumerate in module builtin:

class enumerate(object) | enumerate(iterable[, start]) -> iterator for index, value of iterable |
| Return an enumerate object. iterable must be another object that supports | iteration. The enumerate object yields pairs containing a count (from | start, which defaults to zero) and a value yielded by the iterable argument. | enumerate is useful for obtaining an indexed list: | (0, seq[0]), (1, seq[1]), (2, seq[2]), ... |
| Methods defined here: |
| getattribute(...) | x.getattribute('name') <==> x.name |
| iter(...) | x.iter() <==> iter(x) |
| next(...) | x.next() -> the next value, or raise StopIteration |
| ---------------------------------------------------------------------- | Data and other attributes defined here: |
| new = | T.new(S, ...) -> a new object with type S, a subtype of T

Commentary:

The expected output was:

0 f 1 g b y[2]=g 2 h c 3 i d 4 j e

That is, it was expected that the iterator would yield a value corresponding to the index, whether the index started at zero or not. Using the notation of the doc string, with start=1, the expected behavior was:

| (1, seq[1]), (2, seq[2]), (3, seq[3]), ...

while the actual behavior is:

| (1, seq[0]), (2, seq[1]), (3, seq[2]), ...

The practical problem in the real world code was to do something special with the zero index value of x and y, then run through the remaining values, doing one of two things with x and y, correlated, depending on the value of y.

I can see now that the doc string does in fact correctly specify the actual behavior: nowhere does it say the iterator will begin at any other place than the beginning, so this is not a python bug. I do however question the general usefulness of such behavior. Normally, indices and values are expected to be correlated.

The correct behavior can be simply implemented without using 'enumerate':

x=['a','b','c','d','e'] y=['f','g','h','i','j'] print 0,y[0] for i in xrange(1,len(y)): c=y[i] print i,c if c=='g': print x[i], 'y[%i]=g' % (i) continue print x[i]

This produces the expected results.

If one insists on using enumerate to produce the correct behavior in this example, it can be done as follows: """ x=['a','b','c','d','e'] y=['f','g','h','i','j'] seq=enumerate(y) print '%s %s' % seq.next() for i,c in seq: print i,c if c=='g': print x[i], 'y[%i]=g' % (i) continue print x[i] """ This version produces the expected results, while achieving clarity comparable to that which was sought in the original incorrect code.

Looking a little deeper, the python documentation on enumerate states:

enumerate(sequence[, start=0]) Return an enumerate object. sequence must be a sequence, an iterator, or some other object which supports iteration. The next() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the corresponding value obtained from iterating over iterable. enumerate() is useful for obtaining an indexed series: (0, seq[0]), (1, seq[1]), (2, seq[2]),

This makes a pretty clear implication the value corresponds to the index, so perhaps there really is an issue here. Have at it. I'm going back to work, using 'enumerate' as it actually is, now that I clearly understand it.

One thing is certain: the documentation has to be clarified, for the confusion foreseen prior to adding the start parameter is very real. """

Note: 3.x correct gives the signature at enumerate(iterable, start) rather that enumerate(sequence, start).

I agree that the current entry is a bit awkward. Perhaps the doc would be clearer with a reference to zipping. Removing the unneeded definition of iterable (which should be linked to the definition in the glossary, along with iterator), my suggestion is: ''' enumerate(iterable, start=0) Return an enumerate object, an iterator of tuples, that zips together a sequence of counts and iterable. Each tuple contain a count and an item from iterable, in that order. The counts begin with start, which defaults to 0. enumerate() is useful for obtaining an indexed series: enumerate(seq) produces (0, seq[0]), (1, seq[1]), (2, seq[2]), .... For another example, which uses start:

for i, season in enumerate(['Spring','Summer','Fall','Winter'], 1): ... print(i, season) 1 Spring 2 Summer 3 Fall 4 Winter ''' Note that I changed the example to use a start of 1 instead of 0, to produce a list in traditional form, which is one reason to have the parameter!

""" Changing the 'enumerate' doc string text from:

| (0, seq[0]), (1, seq[1]), (2, seq[2]), ...

to:

| (start, seq[0]), (start+1, seq[1]), (start+2, seq[2]), ...

would completely disambiguate the doc string at the modest cost of sixteen additional characters, a small price for pellucid clarity.

The proposed changes to the formal documentation also seem to me to be prudent, and I hope at this late writing, they have already been committed.

I conclude with a code fragment for the edification of R. David Murray. """

class numerate(object): """ A demonstration of a plausible incorrect interpretation of the 'enumerate' function's doc string and documentation. """ def init(self,seq,start=0): self.seq=seq; self.index=start-1 try: if seq.next: pass #test for iterable for i in xrange(start): self.seq.next() except: if type(seq)==dict: self.seq=seq.keys() self.seq=iter(self.seq[start:])

def next(self): self.index+=1 return self.index,self.seq.next()

def iter(self): return self

if name == "main": #s=['spring','summer','autumn','winter'] s={'spring':'a','summer':'b','autumn':'c','winter':'d'} #s=enumerate(s)#,2) s=numerate(s,2) for t in s: print t