peps: e4678d0f2cbb (original) (raw)

--- a/pep-0201.txt +++ b/pep-0201.txt @@ -84,31 +84,17 @@ The Proposed Solution generator function, available in the builtin module. This function is to be called `zip' and has the following signature:

zip(seqa, [seqb, [...]], [pad=])

zip(seqa, [seqb, [...]])

zip() takes one or more sequences and weaves their elements together, just as map(None, ...) does with sequences of equal

length. The optional keyword argument `pad', if supplied, is a
value used to pad all shorter sequences to the length of the
longest sequence. If `pad' is omitted, then weaving stops when
the shortest sequence is exhausted.

It is not possible to pad short lists with different pad values,
nor will zip() ever raise an exception with lists of different
lengths. To accomplish either behavior, the sequences must be
checked and processed before the call to zip() -- but see the Open
Issues below for more discussion.

length. The weaving stops when the shortest sequence is
exhausted.

-Lazy Execution +Return Value

For performance purposes, zip() does not construct the list of
tuples immediately. Instead it instantiates an object that
implements a getitem() method and conforms to the informal
for-loop protocol. This method constructs the individual tuples
on demand.

Guido is strongly opposed to lazy execution. See Open Issues.

zip() returns a real Python list, the same way map() does.

Examples @@ -127,23 +113,9 @@ Examples >>> zip(a, d) [(1, 12), (2, 13)]

zip(a, d, pad=0)
[(1, 12), (2, 13), (3, 0), (4, 0)]
zip(a, d, pid=0)
Traceback (most recent call last):
```
 File "<stdin>", line 1, in ?[](#l1.49)
```

 File "/usr/tmp/python-iKAOxR", line 11, in zip[](#l1.50)

TypeError: unexpected keyword arguments
zip(a, b, c, d) [(1, 5, 9, 12), (2, 6, 10, 13)]
zip(a, b, c, d, pad=None)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
map(None, a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]

- Note that when the sequences are of the same length, zip() is reversible: @@ -171,235 +143,60 @@ Reference Implementation built-in function and helper class. These would ultimately be replaced by equivalent C code.

class _Zipper:

   def __init__(self, args, kws):[](#l1.69)

```
       # Defaults[](#l1.70)
```
```
       self.__padgiven = 0[](#l1.71)
```
```
       if kws.has_key('pad'):[](#l1.72)
```

           self.__padgiven = 1[](#l1.73)

           self.__pad = kws['pad'][](#l1.74)

```
           del kws['pad'][](#l1.75)
```

       # Assert no unknown arguments are left[](#l1.76)

```
       if kws:[](#l1.77)
```

           raise TypeError('unexpected keyword arguments')[](#l1.78)

       self.__sequences = args[](#l1.79)

       self.__seqlen = len(args)[](#l1.80)

```
   def __getitem__(self, i):[](#l1.82)
```

       if not self.__sequences:[](#l1.83)

```
           raise IndexError[](#l1.84)
```
```
       ret = [][](#l1.85)
```
```
       exhausted = 0[](#l1.86)
```

       for s in self.__sequences:[](#l1.87)

```
           try:[](#l1.88)
```

               ret.append(s[i])[](#l1.89)

```
           except IndexError:[](#l1.90)
```

               if not self.__padgiven:[](#l1.91)

```
                   raise[](#l1.92)
```

               exhausted = exhausted + 1[](#l1.93)

               if exhausted == self.__seqlen:[](#l1.94)

```
                   raise[](#l1.95)
```

               ret.append(self.__pad)[](#l1.96)

```
       return tuple(ret)[](#l1.97)
```

```
   def __len__(self):[](#l1.99)
```

       # If we're padding, then len is the length of the longest sequence,[](#l1.100)

       # otherwise it's the length of the shortest sequence.[](#l1.101)

       if not self.__padgiven:[](#l1.102)

```
           shortest = -1[](#l1.103)
```

           for s in self.__sequences:[](#l1.104)

```
               slen = len(s)[](#l1.105)
```

               if shortest < 0 or slen < shortest:[](#l1.106)

                   shortest = slen[](#l1.107)

```
           if shortest < 0:[](#l1.108)
```
```
               return 0[](#l1.109)
```
```
           return shortest[](#l1.110)
```
```
       longest = 0[](#l1.111)
```

       for s in self.__sequences:[](#l1.112)

```
           slen = len(s)[](#l1.113)
```

           if slen > longest:[](#l1.114)

               longest = slen[](#l1.115)

```
       return longest[](#l1.116)
```

```
   def __cmp__(self, other):[](#l1.118)
```
```
       i = 0[](#l1.119)
```
```
       smore = 1[](#l1.120)
```
```
       omore = 1[](#l1.121)
```
```
       while 1:[](#l1.122)
```
```
           try:[](#l1.123)
```
```
               si = self[i][](#l1.124)
```

           except IndexError:[](#l1.125)

```
               smore = 0[](#l1.126)
```
```
           try:[](#l1.127)
```
```
               oi = other[i][](#l1.128)
```

           except IndexError:[](#l1.129)

```
               omore = 0[](#l1.130)
```

           if not smore and not omore:[](#l1.131)

```
               return 0[](#l1.132)
```
```
           elif not smore:[](#l1.133)
```
```
               return -1[](#l1.134)
```
```
           elif not omore:[](#l1.135)
```
```
               return 1[](#l1.136)
```

           test = cmp(si, oi)[](#l1.137)

```
           if test:[](#l1.138)
```
```
               return test[](#l1.139)
```
```
           i = i + 1[](#l1.140)
```

```
   def __str__(self):[](#l1.142)
```
```
       ret = [][](#l1.143)
```
```
       i = 0[](#l1.144)
```
```
       while 1:[](#l1.145)
```
```
           try:[](#l1.146)
```

               ret.append(self[i])[](#l1.147)

           except IndexError:[](#l1.148)

```
               break[](#l1.149)
```
```
           i = i + 1[](#l1.150)
```
```
       return str(ret)[](#l1.151)
```
```
   __repr__ = __str__[](#l1.152)
```

- -

def zip(*args, **kws):
```
   return _Zipper(args, kws)[](#l1.156)
```

def zip(*args):
```
   if not args:[](#l1.158)
```

       raise TypeError('zip() expects one or more sequence arguments')[](#l1.159)

```
   ret = [][](#l1.160)
```

   # find the length of the shortest sequence[](#l1.161)

   shortest = min(*map(len, args))[](#l1.162)

```
   for i in range(shortest):[](#l1.163)
```
```
       item = [][](#l1.164)
```
```
       for s in args:[](#l1.165)
```
```
           item.append(s[i])[](#l1.166)
```

       ret.append(tuple(item))[](#l1.167)

```
   return ret[](#l1.168)
```

-Rejected Elaborations -

Some people have suggested that the user be able to specify the
type of the inner and outer containers for the zipped sequence.
This would be specified by additional keyword arguments to zip(),
named inner' and outer'.

This elaboration is rejected for several reasons. First, there
really is no outer container, even though there appears to be an
outer list container the example above. This is simply an
artifact of the repr() of the zipped object. User code can do its
own looping over the zipped object via getitem(), and build
any type of outer container for the fully evaluated, concrete
sequence. For example, to build a zipped object with lists as an
outer container, use

+BDFL Pronouncements

   >>> list(zip(sequence_a, sequence_b, sequence_c))[](#l1.188)

for tuple outer container, use

   >>> tuple(zip(sequence_a, sequence_b, sequence_c))[](#l1.192)

This type of construction will usually not be necessary though,
since it is expected that zipped objects will most often appear in
for-loops.

Second, allowing the user to specify the inner container
introduces needless complexity and arbitrary decisions. You might
imagine that instead of the default tuple inner container, the
user could prefer a list, or a dictionary, or instances of some
sequence-like class.

Note: the BDFL refers to Guido van Rossum, Python's Benevolent
Dictator For Life.

One problem is the API. Should the argument to `inner' be a type
or a template object? For flexibility, the argument should
probably be a type object (i.e. TupleType, ListType, DictType), or
a class. For classes, the implementation could just pass the zip
element to the constructor. But what about built-in types that
don't have constructors? They would have to be special-cased in
the implementation (i.e. what is the constructor for TupleType?
The tuple() built-in).

Another problem that arises is for zips greater than length two.
Say you had three sequences and you wanted the inner type to be a
dictionary. What would the semantics of the following be?

   >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)[](#l1.219)

- The function's name. An earlier version of this PEP included an

 open issue listing 20+ proposed alternative names to zip().  In[](#l1.221)

 the face of no overwhelmingly better choice, the BDFL strongly[](#l1.222)

 prefers zip() due to it's Haskell[2] heritage.  See version 1.7[](#l1.223)

 of this PEP for the list of alteratives.[](#l1.224)

Would the key be (element_a, element_b) and the value be
element_c, or would the key be element_a and the value be
(element_b, element_c)? Or should an exception be thrown?

This suggests that the specification of the inner container type
is needless complexity. It isn't likely that the inner container
will need to be specified very often, and it is easy to roll your
own should you need it. Tuples are chosen for the inner container
type due to their (slight) memory footprint and performance
advantages.

- zip() shall be a built-in function.

- -Open Issues -

- Guido opposes lazy evaluation for zip(). He believes zip()

 should return a real list, with an xzip() lazy evaluator added[](#l1.242)

```
 later if necessary.[](#l1.243)
```

- What should "zip(a)" do? Given

```
 a = (1, 2, 3); zip(a)[](#l1.247)
```

- Optional padding. An earlier version of this PEP proposed an

 optional `pad' keyword argument, which would be used when the[](#l1.249)

 argument sequences were not the same length.  This is similar[](#l1.250)

 behavior to the map(None, ...) semantics except that the user[](#l1.251)

 would be able to specify pad object.  This has been rejected by[](#l1.252)

 the BDFL in favor of always truncating to the shortest sequence.[](#l1.253)

 three outcomes are possible.[](#l1.255)

 1) Returns [(1,), (2,), (3,)][](#l1.257)

    Pros: no special casing in the implementation or in user[](#l1.259)

    code, and is more consistent with the description of it's[](#l1.260)

    semantics.  Cons: this isn't what map(None, a) would return,[](#l1.261)

    and may be counter to user expectations.[](#l1.262)

```
 2) Returns [1, 2, 3][](#l1.264)
```

    Pros: consistency with map(None, a), and simpler code for[](#l1.266)

```
    for-loops, e.g.[](#l1.267)
```

```
    for i in zip(a):[](#l1.269)
```

- Lazy evaluation. An earlier version of this PEP proposed that

 zip() return a built-in object that performed lazy evaluation[](#l1.271)

 using __getitem__() protocol.  This has been strongly rejected[](#l1.272)

 by the BDFL in favor of returning a real Python list.  If lazy[](#l1.273)

 evaluation is desired in the future, the BDFL suggests an xzip()[](#l1.274)

```
 function be added.[](#l1.275)
```

```
    instead of[](#l1.277)
```

```
    for (i,) in zip(a):[](#l1.279)
```

    Cons: too much complexity and special casing for what should[](#l1.281)

    be a relatively rare usage pattern.[](#l1.282)

```
 3) Raises TypeError[](#l1.284)
```

    Pros: zip(a) doesn't make much sense and could be confusing[](#l1.286)

```
    to explain.[](#l1.287)
```

    Cons: needless restriction[](#l1.289)

 Current scoring seems to generally favor outcome 1.[](#l1.291)

- What should "zip()" do?

- zip() with no arguments. the BDFL strongly prefers this raise a
```
 TypeError exception.[](#l1.295)
```

 Along similar lines, zip() with no arguments (or zip() with just[](#l1.297)

 a pad argument) can have ambiguous semantics.  Should this[](#l1.298)

 return no elements or an infinite number?  For these reaons,[](#l1.299)

 raising a TypeError exception in this case makes the most[](#l1.300)

```
 sense.[](#l1.301)
```

- The name of the built-in `zip' may cause some initial confusion

 with the zip compression algorithm.  Other suggestions include[](#l1.304)

 (but are not limited to!): marry, weave, parallel, lace, braid,[](#l1.305)

 interlace, permute, furl, tuples, lists, stitch, collate, knit,[](#l1.306)

 plait, fold, with, mktuples, maketuples, totuples, gentuples,[](#l1.307)

```
 tupleorama.[](#l1.308)
```

 All have disadvantages, and there is no clear unanimous choice,[](#l1.310)

 therefore the decision was made to go with `zip' because the[](#l1.311)

 same functionality is available in other languages[](#l1.312)

 (e.g. Haskell) under the name `zip'[2].[](#l1.313)

- zip() with one argument. the BDFL strongly prefers that this
```
 return a list of 1-tuples.[](#l1.315)
```

- Should zip() be including in the builtins module or should it be

 in a separate generators module (possibly with other candidate[](#l1.318)

```
 functions like irange())?[](#l1.319)
```

- Padding short sequences with different values. A suggestion has

 been made to allow a `padtuple' (probably better called `pads'[](#l1.322)

 or `padseq') argument similar to `pad'.  This sequence must have[](#l1.323)

 a length equal to the number of sequences given.  It is a[](#l1.324)

 sequence of the individual pad values to use for each sequence,[](#l1.325)

 should it be shorter than the maximum length.[](#l1.326)

 One problem is what to do if `padtuple' itself isn't of the[](#l1.328)

 right length?  A TypeError seems to be the only choice here.[](#l1.329)

 How does `pad' and `padtuple' interact?  Perhaps if padtuple[](#l1.331)

 were too short, it could use pad as a fallback.  padtuple would[](#l1.332)

 always override pad if both were given.[](#l1.333)

- Inner and outer container control. An earlier version of this

 PEP contains a rather lengthy discussion on a feature that some[](#l1.335)

 people wanted, namely the ability to control what the inner and[](#l1.336)

 outer container types were (they are tuples and list[](#l1.337)

 respectively in this version of the PEP).  Given the simplified[](#l1.338)

 API and implementation, this elaboration is rejected.  For a[](#l1.339)

 more detailed analysis, see version 1.7 of this PEP.[](#l1.340)

References @@ -409,6 +206,7 @@ References TBD: URL to python-dev archives + Copyright This document has been placed in the public domain.