peps: e4678d0f2cbb (original) (raw)
--- a/pep-0201.txt +++ b/pep-0201.txt @@ -84,31 +84,17 @@ The Proposed Solution generator function, available in the builtin module. This function is to be called `zip' and has the following signature:
zip() takes one or more sequences and weaves their elements together, just as map(None, ...) does with sequences of equal
- length. The optional keyword argument `pad', if supplied, is a
- value used to pad all shorter sequences to the length of the
- longest sequence. If `pad' is omitted, then weaving stops when
- the shortest sequence is exhausted.
- It is not possible to pad short lists with different pad values,
- nor will zip() ever raise an exception with lists of different
- lengths. To accomplish either behavior, the sequences must be
- checked and processed before the call to zip() -- but see the Open
- Issues below for more discussion.
- For performance purposes, zip() does not construct the list of
- tuples immediately. Instead it instantiates an object that
- implements a getitem() method and conforms to the informal
- for-loop protocol. This method constructs the individual tuples
- on demand.
Examples @@ -127,23 +113,9 @@ Examples >>> zip(a, d) [(1, 12), (2, 13)]
- [(1, 12), (2, 13), (3, 0), (4, 0)]
- Traceback (most recent call last):
File "<stdin>", line 1, in ?[](#l1.49)
File "/usr/tmp/python-iKAOxR", line 11, in zip[](#l1.50)
- TypeError: unexpected keyword arguments
- [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
- [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
- Note that when the sequences are of the same length, zip() is reversible: @@ -171,235 +143,60 @@ Reference Implementation built-in function and helper class. These would ultimately be replaced by equivalent C code.
- class _Zipper:
def __init__(self, args, kws):[](#l1.69)
# Defaults[](#l1.70)
self.__padgiven = 0[](#l1.71)
if kws.has_key('pad'):[](#l1.72)
self.__padgiven = 1[](#l1.73)
self.__pad = kws['pad'][](#l1.74)
del kws['pad'][](#l1.75)
# Assert no unknown arguments are left[](#l1.76)
if kws:[](#l1.77)
raise TypeError('unexpected keyword arguments')[](#l1.78)
self.__sequences = args[](#l1.79)
self.__seqlen = len(args)[](#l1.80)
def __getitem__(self, i):[](#l1.82)
if not self.__sequences:[](#l1.83)
raise IndexError[](#l1.84)
ret = [][](#l1.85)
exhausted = 0[](#l1.86)
for s in self.__sequences:[](#l1.87)
try:[](#l1.88)
ret.append(s[i])[](#l1.89)
except IndexError:[](#l1.90)
if not self.__padgiven:[](#l1.91)
raise[](#l1.92)
exhausted = exhausted + 1[](#l1.93)
if exhausted == self.__seqlen:[](#l1.94)
raise[](#l1.95)
ret.append(self.__pad)[](#l1.96)
return tuple(ret)[](#l1.97)
def __len__(self):[](#l1.99)
# If we're padding, then len is the length of the longest sequence,[](#l1.100)
# otherwise it's the length of the shortest sequence.[](#l1.101)
if not self.__padgiven:[](#l1.102)
shortest = -1[](#l1.103)
for s in self.__sequences:[](#l1.104)
slen = len(s)[](#l1.105)
if shortest < 0 or slen < shortest:[](#l1.106)
shortest = slen[](#l1.107)
if shortest < 0:[](#l1.108)
return 0[](#l1.109)
return shortest[](#l1.110)
longest = 0[](#l1.111)
for s in self.__sequences:[](#l1.112)
slen = len(s)[](#l1.113)
if slen > longest:[](#l1.114)
longest = slen[](#l1.115)
return longest[](#l1.116)
def __cmp__(self, other):[](#l1.118)
i = 0[](#l1.119)
smore = 1[](#l1.120)
omore = 1[](#l1.121)
while 1:[](#l1.122)
try:[](#l1.123)
si = self[i][](#l1.124)
except IndexError:[](#l1.125)
smore = 0[](#l1.126)
try:[](#l1.127)
oi = other[i][](#l1.128)
except IndexError:[](#l1.129)
omore = 0[](#l1.130)
if not smore and not omore:[](#l1.131)
return 0[](#l1.132)
elif not smore:[](#l1.133)
return -1[](#l1.134)
elif not omore:[](#l1.135)
return 1[](#l1.136)
test = cmp(si, oi)[](#l1.137)
if test:[](#l1.138)
return test[](#l1.139)
i = i + 1[](#l1.140)
def __str__(self):[](#l1.142)
ret = [][](#l1.143)
i = 0[](#l1.144)
while 1:[](#l1.145)
try:[](#l1.146)
ret.append(self[i])[](#l1.147)
except IndexError:[](#l1.148)
break[](#l1.149)
i = i + 1[](#l1.150)
return str(ret)[](#l1.151)
__repr__ = __str__[](#l1.152)
- def zip(*args):
if not args:[](#l1.158)
raise TypeError('zip() expects one or more sequence arguments')[](#l1.159)
ret = [][](#l1.160)
# find the length of the shortest sequence[](#l1.161)
shortest = min(*map(len, args))[](#l1.162)
for i in range(shortest):[](#l1.163)
item = [][](#l1.164)
for s in args:[](#l1.165)
item.append(s[i])[](#l1.166)
ret.append(tuple(item))[](#l1.167)
return ret[](#l1.168)
- Some people have suggested that the user be able to specify the
- type of the inner and outer containers for the zipped sequence.
- This would be specified by additional keyword arguments to zip(),
- named
inner' and
outer'.
- This elaboration is rejected for several reasons. First, there
- really is no outer container, even though there appears to be an
- outer list container the example above. This is simply an
- artifact of the repr() of the zipped object. User code can do its
- own looping over the zipped object via getitem(), and build
- any type of outer container for the fully evaluated, concrete
- sequence. For example, to build a zipped object with lists as an
- outer container, use
>>> list(zip(sequence_a, sequence_b, sequence_c))[](#l1.188)
- This type of construction will usually not be necessary though,
- since it is expected that zipped objects will most often appear in
- for-loops.
- Second, allowing the user to specify the inner container
- introduces needless complexity and arbitrary decisions. You might
- imagine that instead of the default tuple inner container, the
- user could prefer a list, or a dictionary, or instances of some
- sequence-like class.
- One problem is the API. Should the argument to `inner' be a type
- or a template object? For flexibility, the argument should
- probably be a type object (i.e. TupleType, ListType, DictType), or
- a class. For classes, the implementation could just pass the zip
- element to the constructor. But what about built-in types that
- don't have constructors? They would have to be special-cased in
- the implementation (i.e. what is the constructor for TupleType?
- The tuple() built-in).
- Another problem that arises is for zips greater than length two.
- Say you had three sequences and you wanted the inner type to be a
- dictionary. What would the semantics of the following be?
>>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)[](#l1.219)
open issue listing 20+ proposed alternative names to zip(). In[](#l1.221)
the face of no overwhelmingly better choice, the BDFL strongly[](#l1.222)
prefers zip() due to it's Haskell[2] heritage. See version 1.7[](#l1.223)
of this PEP for the list of alteratives.[](#l1.224)
- Would the key be (element_a, element_b) and the value be
- element_c, or would the key be element_a and the value be
- (element_b, element_c)? Or should an exception be thrown?
- This suggests that the specification of the inner container type
- is needless complexity. It isn't likely that the inner container
- will need to be specified very often, and it is easy to roll your
- own should you need it. Tuples are chosen for the inner container
- type due to their (slight) memory footprint and performance
- advantages.
should return a real list, with an xzip() lazy evaluator added[](#l1.242)
later if necessary.[](#l1.243)
a = (1, 2, 3); zip(a)[](#l1.247)
optional `pad' keyword argument, which would be used when the[](#l1.249)
argument sequences were not the same length. This is similar[](#l1.250)
behavior to the map(None, ...) semantics except that the user[](#l1.251)
would be able to specify pad object. This has been rejected by[](#l1.252)
the BDFL in favor of always truncating to the shortest sequence.[](#l1.253)
three outcomes are possible.[](#l1.255)
1) Returns [(1,), (2,), (3,)][](#l1.257)
Pros: no special casing in the implementation or in user[](#l1.259)
code, and is more consistent with the description of it's[](#l1.260)
semantics. Cons: this isn't what map(None, a) would return,[](#l1.261)
and may be counter to user expectations.[](#l1.262)
2) Returns [1, 2, 3][](#l1.264)
Pros: consistency with map(None, a), and simpler code for[](#l1.266)
for-loops, e.g.[](#l1.267)
for i in zip(a):[](#l1.269)
zip() return a built-in object that performed lazy evaluation[](#l1.271)
using __getitem__() protocol. This has been strongly rejected[](#l1.272)
by the BDFL in favor of returning a real Python list. If lazy[](#l1.273)
evaluation is desired in the future, the BDFL suggests an xzip()[](#l1.274)
function be added.[](#l1.275)
instead of[](#l1.277)
for (i,) in zip(a):[](#l1.279)
Cons: too much complexity and special casing for what should[](#l1.281)
be a relatively rare usage pattern.[](#l1.282)
3) Raises TypeError[](#l1.284)
Pros: zip(a) doesn't make much sense and could be confusing[](#l1.286)
to explain.[](#l1.287)
Cons: needless restriction[](#l1.289)
Current scoring seems to generally favor outcome 1.[](#l1.291)
Along similar lines, zip() with no arguments (or zip() with just[](#l1.297)
a pad argument) can have ambiguous semantics. Should this[](#l1.298)
return no elements or an infinite number? For these reaons,[](#l1.299)
raising a TypeError exception in this case makes the most[](#l1.300)
sense.[](#l1.301)
with the zip compression algorithm. Other suggestions include[](#l1.304)
(but are not limited to!): marry, weave, parallel, lace, braid,[](#l1.305)
interlace, permute, furl, tuples, lists, stitch, collate, knit,[](#l1.306)
plait, fold, with, mktuples, maketuples, totuples, gentuples,[](#l1.307)
tupleorama.[](#l1.308)
All have disadvantages, and there is no clear unanimous choice,[](#l1.310)
therefore the decision was made to go with `zip' because the[](#l1.311)
same functionality is available in other languages[](#l1.312)
(e.g. Haskell) under the name `zip'[2].[](#l1.313)
in a separate generators module (possibly with other candidate[](#l1.318)
functions like irange())?[](#l1.319)
been made to allow a `padtuple' (probably better called `pads'[](#l1.322)
or `padseq') argument similar to `pad'. This sequence must have[](#l1.323)
a length equal to the number of sequences given. It is a[](#l1.324)
sequence of the individual pad values to use for each sequence,[](#l1.325)
should it be shorter than the maximum length.[](#l1.326)
One problem is what to do if `padtuple' itself isn't of the[](#l1.328)
right length? A TypeError seems to be the only choice here.[](#l1.329)
How does `pad' and `padtuple' interact? Perhaps if padtuple[](#l1.331)
were too short, it could use pad as a fallback. padtuple would[](#l1.332)
always override pad if both were given.[](#l1.333)
PEP contains a rather lengthy discussion on a feature that some[](#l1.335)
people wanted, namely the ability to control what the inner and[](#l1.336)
outer container types were (they are tuples and list[](#l1.337)
respectively in this version of the PEP). Given the simplified[](#l1.338)
API and implementation, this elaboration is rejected. For a[](#l1.339)
more detailed analysis, see version 1.7 of this PEP.[](#l1.340)
References @@ -409,6 +206,7 @@ References TBD: URL to python-dev archives + Copyright This document has been placed in the public domain.