[Python-checkins] CVS: python/nondist/peps pep-0201.txt,1.1,1.2
Barry Warsaw
python-dev@python.org
Mon, 17 Jul 2000 11:49:25 -0700
Update of /cvsroot/python/python/nondist/peps
In directory slayer.i.sourceforge.net:/tmp/cvs-serv9849
Modified Files:
pep-0201.txt
Log Message:
Latest update.
After consultation with Guido, zip() is chosen as the name of this
built-in.
In reference implementation added an __len__() method.
Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).
Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.
Index: pep-0201.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0201.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** pep-0201.txt 2000/07/13 06:33:08 1.1
--- pep-0201.txt 2000/07/17 18:49:21 1.2
***************
*** 26,32 ****
as `parallel for loops'. A standard for-loop in Python iterates
over every element in the sequence until the sequence is
! exhausted. The for-loop can also be explicitly exited with a
! `break' statement, and for-loops can have else: clauses, but these
! is has no bearing on this PEP.
For-loops can iterate over built-in types such as lists and
--- 26,33 ----
as `parallel for loops'. A standard for-loop in Python iterates
over every element in the sequence until the sequence is
! exhausted. A `break' statement inside the loop suite causes an
! explicit loop exit. For-loops also have else: clauses which get
! executed when the loop exits normally (i.e. not by execution of a
! break).
For-loops can iterate over built-in types such as lists and
***************
*** 36,46 ****
monotonically increasing index starting at 0, and this method
should raise an IndexError when the sequence is exhausted. This
! protocol is current undocumented -- a defect in Python's
documentation hopefully soon corrected.
! For loops are described in the language reference manual here
! http://www.python.org/doc/devel/ref/for.html
! An example for-loop
>>> for i in (1, 2, 3): print i
--- 37,47 ----
monotonically increasing index starting at 0, and this method
should raise an IndexError when the sequence is exhausted. This
! protocol is currently undocumented -- a defect in Python's
documentation hopefully soon corrected.
! For-loops are described in the Python language reference
! manual[1].
! An example for-loop:
>>> for i in (1, 2, 3): print i
***************
*** 89,93 ****
- The use of the magic `None' first argument is non-obvious.
! - Its has arbitrary, often unintended, and inflexible semantics
when the lists are not of the same length: the shorter sequences
are padded with `None'.
--- 90,94 ----
- The use of the magic `None' first argument is non-obvious.
! - It has arbitrary, often unintended, and inflexible semantics
when the lists are not of the same length: the shorter sequences
are padded with `None'.
***************
*** 111,119 ****
The proposed solution is to introduce a new built-in sequence
generator function, available in the __builtin__ module. This
! function is to be called `marry' and has the following signature:
! marry(seqa, [seqb, [...]], [pad=<value>])
! marry() takes one or more sequences and weaves their elements
together, just as map(None, ...) does with sequences of equal
length. The optional keyword argument `pad', if supplied, is a
--- 112,120 ----
The proposed solution is to introduce a new built-in sequence
generator function, available in the __builtin__ module. This
! function is to be called `zip' and has the following signature:
! zip(seqa, [seqb, [...]], [pad=<value>])
! zip() takes one or more sequences and weaves their elements
together, just as map(None, ...) does with sequences of equal
length. The optional keyword argument `pad', if supplied, is a
***************
*** 123,129 ****
It is not possible to pad short lists with different pad values,
! nor will marry() ever raise an exception with lists of different
! lengths. To accomplish both of these, the sequences must be
! checked and processed before the call to marry().
--- 124,130 ----
It is not possible to pad short lists with different pad values,
! nor will zip() ever raise an exception with lists of different
! lengths. To accomplish either behavior, the sequences must be
! checked and processed before the call to zip().
***************
*** 131,135 ****
Lazy Execution
! For performance purposes, marry() does not construct the list of
tuples immediately. Instead it instantiates an object that
implements a __getitem__() method and conforms to the informal
--- 132,136 ----
Lazy Execution
! For performance purposes, zip() does not construct the list of
tuples immediately. Instead it instantiates an object that
implements a __getitem__() method and conforms to the informal
***************
*** 149,171 ****
>>> d = (12, 13)
! >>> marry(a, b)
[(1, 5), (2, 6), (3, 7), (4, 8)]
! >>> marry(a, d)
[(1, 12), (2, 13)]
! >>> marry(a, d, pad=0)
[(1, 12), (2, 13), (3, 0), (4, 0)]
! >>> marry(a, d, pid=0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
! File "/usr/tmp/python-iKAOxR", line 11, in marry
TypeError: unexpected keyword arguments
! >>> marry(a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13)]
! >>> marry(a, b, c, d, pad=None)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
>>> map(None, a, b, c, d)
--- 150,172 ----
>>> d = (12, 13)
! >>> zip(a, b)
[(1, 5), (2, 6), (3, 7), (4, 8)]
! >>> zip(a, d)
[(1, 12), (2, 13)]
! >>> zip(a, d, pad=0)
[(1, 12), (2, 13), (3, 0), (4, 0)]
! >>> zip(a, d, pid=0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
! File "/usr/tmp/python-iKAOxR", line 11, in zip
TypeError: unexpected keyword arguments
! >>> zip(a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13)]
! >>> zip(a, b, c, d, pad=None)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
>>> map(None, a, b, c, d)
***************
*** 176,185 ****
Reference Implementation
! Here is a reference implementation, in Python of the marry()
built-in function and helper class. These would ultimately be
replaced by equivalent C code.
! class _Marriage:
def __init__(self, args, kws):
self.__padgiven = 0
if kws.has_key('pad'):
--- 177,187 ----
Reference Implementation
! Here is a reference implementation, in Python of the zip()
built-in function and helper class. These would ultimately be
replaced by equivalent C code.
! class _Zipper:
def __init__(self, args, kws):
+ # Defaults
self.__padgiven = 0
if kws.has_key('pad'):
***************
*** 187,190 ****
--- 189,193 ----
self.__pad = kws['pad']
del kws['pad']
+ # Assert no unknown arguments are left
if kws:
raise TypeError('unexpected keyword arguments')
***************
*** 207,210 ****
--- 210,230 ----
return tuple(ret)
+ def __len__(self):
+ # If we're padding, then len is the length of the longest sequence,
+ # otherwise it's the length of the shortest sequence.
+ if not self.__padgiven:
+ shortest = -1
+ for s in self.__sequences:
+ slen = len(s)
+ if shortest < 0 or slen < shortest:
+ shortest = slen
+ return shortest
+ longest = 0
+ for s in self.__sequences:
+ slen = len(s)
+ if slen > longest:
+ longest = slen
+ return longest
+
def __str__(self):
ret = []
***************
*** 220,242 ****
! def marry(*args, **kws):
! return _Marriage(args, kws)
Open Issues
! What should "marry(a)" do?
! Given a = (1, 2, 3), should marry(a) return [(1,), (2,), (3,)] or
! should it return [1, 2, 3]? The first is more consistent with the
! description given above, while the latter is what map(None, a)
! does, and may be more consistent with user expectation.
! The latter interpretation requires special casing, which is not
! present in the reference implementation. It returns
! >>> marry(a)
! [(1,), (2,), (3,), (4,)]
--- 240,367 ----
! def zip(*args, **kws):
! return _Zipper(args, kws)
+ Rejected Elaborations
+
+ Some people have suggested that the user be able to specify the
+ type of the inner and outer containers for the zipped sequence.
+ This would be specified by additional keyword arguments to zip(),
+ named `inner' and `outer'.
+
+ This elaboration is rejected for several reasons. First, there
+ really is no outer container, even though there appears to be an
+ outer list container the example above. This is simply an
+ artifact of the repr() of the zipped object. User code can do its
+ own looping over the zipped object via __getitem__(), and build
+ any type of outer container for the fully evaluated, concrete
+ sequence. For example, to build a zipped object with lists as an
+ outer container, use
+
+ >>> list(zip(sequence_a, sequence_b, sequence_c))
+
+ for tuple outer container, use
+
+ >>> tuple(zip(sequence_a, sequence_b, sequence_c))
+
+ This type of construction will usually not be necessary though,
+ since it is expected that zipped objects will most often appear in
+ for-loops.
+
+ Second, allowing the user to specify the inner container
+ introduces needless complexity and arbitrary decisions. You might
+ imagine that instead of the default tuple inner container, the
+ user could prefer a list, or a dictionary, or instances of some
+ sequence-like class.
+
+ One problem is the API. Should the argument to `inner' be a type
+ or a template object? For flexibility, the argument should
+ probably be a type object (i.e. TupleType, ListType, DictType), or
+ a class. For classes, the implementation could just pass the zip
+ element to the constructor. But what about built-in types that
+ don't have constructors? They would have to be special-cased in
+ the implementation (i.e. what is the constructor for TupleType?
+ The tuple() built-in).
+
+ Another problem that arises is for zips greater than length two.
+ Say you had three sequences and you wanted the inner type to be a
+ dictionary. What would the semantics of the following be?
+
+ >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
+
+ Would the key be (element_a, element_b) and the value be
+ element_c, or would the key be element_a and the value be
+ (element_b, element_c)? Or should an exception be thrown?
+
+ This suggests that the specification of the inner container type
+ is needless complexity. It isn't likely that the inner container
+ will need to be specified very often, and it is easy to roll your
+ own should you need it. Tuples are chosen for the inner container
+ type due to their (slight) memory footprint and performance
+ advantages.
+
+
+
Open Issues
+
+ - What should "zip(a)" do? Given
+
+ a = (1, 2, 3); zip(a)
+
+ three outcomes are possible.
+
+ 1) Returns [(1,), (2,), (3,)]
+
+ Pros: no special casing in the implementation or in user
+ code, and is more consistent with the description of it's
+ semantics. Cons: this isn't what map(None, a) would return,
+ and may be counter to user expectations.
! 2) Returns [1, 2, 3]
! Pros: consistency with map(None, a), and simpler code for
! for-loops, e.g.
! for i in zip(a):
!
! instead of
!
! for (i,) in zip(a):
!
! Cons: too much complexity and special casing for what should
! be a relatively rare usage pattern.
!
! 3) Raises TypeError
!
! Pros: None
!
! Cons: needless restriction
!
! Current scoring seems to generally favor outcome 1.
!
! - The name of the built-in `zip' may cause some initial confusion
! with the zip compression algorithm. Other suggestions include
! (but are not limited to!): marry, weave, parallel, lace, braid,
! interlace, permute, furl, tuples, lists, stitch, collate, knit,
! plait, and with. All have disadvantages, and there is no clear
! unanimous choice, therefore the decision was made to go with
! `zip' because the same functionality is available in other
! languages (e.g. Haskell) under the name `zip'[2].
!
!
!
! References
!
! [1] http://www.python.org/doc/devel/ref/for.html
! [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
!
! TBD: URL to python-dev archives
!
!
! Copyright
! This document has been placed in the public domain.