[Python-checkins] CVS: python/nondist/peps pep-0201.txt,1.1,1.2

Barry Warsaw python-dev@python.org
Mon, 17 Jul 2000 11:49:25 -0700


Update of /cvsroot/python/python/nondist/peps
In directory slayer.i.sourceforge.net:/tmp/cvs-serv9849

Modified Files:
	pep-0201.txt 
Log Message:
Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.


Index: pep-0201.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0201.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** pep-0201.txt	2000/07/13 06:33:08	1.1
--- pep-0201.txt	2000/07/17 18:49:21	1.2
***************
*** 26,32 ****
      as `parallel for loops'.  A standard for-loop in Python iterates
      over every element in the sequence until the sequence is
!     exhausted.  The for-loop can also be explicitly exited with a
!     `break' statement, and for-loops can have else: clauses, but these
!     is has no bearing on this PEP.
  
      For-loops can iterate over built-in types such as lists and
--- 26,33 ----
      as `parallel for loops'.  A standard for-loop in Python iterates
      over every element in the sequence until the sequence is
!     exhausted.  A `break' statement inside the loop suite causes an
!     explicit loop exit.  For-loops also have else: clauses which get
!     executed when the loop exits normally (i.e. not by execution of a
!     break).
  
      For-loops can iterate over built-in types such as lists and
***************
*** 36,46 ****
      monotonically increasing index starting at 0, and this method
      should raise an IndexError when the sequence is exhausted.  This
!     protocol is current undocumented -- a defect in Python's
      documentation hopefully soon corrected.
  
!     For loops are described in the language reference manual here
!     http://www.python.org/doc/devel/ref/for.html
  
!     An example for-loop
  
      >>> for i in (1, 2, 3): print i
--- 37,47 ----
      monotonically increasing index starting at 0, and this method
      should raise an IndexError when the sequence is exhausted.  This
!     protocol is currently undocumented -- a defect in Python's
      documentation hopefully soon corrected.
  
!     For-loops are described in the Python language reference
!     manual[1].
  
!     An example for-loop:
  
      >>> for i in (1, 2, 3): print i
***************
*** 89,93 ****
      - The use of the magic `None' first argument is non-obvious.
  
!     - Its has arbitrary, often unintended, and inflexible semantics
        when the lists are not of the same length: the shorter sequences
        are padded with `None'.
--- 90,94 ----
      - The use of the magic `None' first argument is non-obvious.
  
!     - It has arbitrary, often unintended, and inflexible semantics
        when the lists are not of the same length: the shorter sequences
        are padded with `None'.
***************
*** 111,119 ****
      The proposed solution is to introduce a new built-in sequence
      generator function, available in the __builtin__ module.  This
!     function is to be called `marry' and has the following signature:
  
!     marry(seqa, [seqb, [...]], [pad=<value>])
  
!     marry() takes one or more sequences and weaves their elements
      together, just as map(None, ...) does with sequences of equal
      length.  The optional keyword argument `pad', if supplied, is a
--- 112,120 ----
      The proposed solution is to introduce a new built-in sequence
      generator function, available in the __builtin__ module.  This
!     function is to be called `zip' and has the following signature:
  
!     zip(seqa, [seqb, [...]], [pad=<value>])
  
!     zip() takes one or more sequences and weaves their elements
      together, just as map(None, ...) does with sequences of equal
      length.  The optional keyword argument `pad', if supplied, is a
***************
*** 123,129 ****
  
      It is not possible to pad short lists with different pad values,
!     nor will marry() ever raise an exception with lists of different
!     lengths.  To accomplish both of these, the sequences must be
!     checked and processed before the call to marry().
  
  
--- 124,130 ----
  
      It is not possible to pad short lists with different pad values,
!     nor will zip() ever raise an exception with lists of different
!     lengths.  To accomplish either behavior, the sequences must be
!     checked and processed before the call to zip().
  
  
***************
*** 131,135 ****
  Lazy Execution
  
!     For performance purposes, marry() does not construct the list of
      tuples immediately.  Instead it instantiates an object that
      implements a __getitem__() method and conforms to the informal
--- 132,136 ----
  Lazy Execution
  
!     For performance purposes, zip() does not construct the list of
      tuples immediately.  Instead it instantiates an object that
      implements a __getitem__() method and conforms to the informal
***************
*** 149,171 ****
      >>> d = (12, 13)
  
!     >>> marry(a, b)
      [(1, 5), (2, 6), (3, 7), (4, 8)]
  
!     >>> marry(a, d)
      [(1, 12), (2, 13)]
  
!     >>> marry(a, d, pad=0)
      [(1, 12), (2, 13), (3, 0), (4, 0)]
      
!     >>> marry(a, d, pid=0)
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
!       File "/usr/tmp/python-iKAOxR", line 11, in marry
      TypeError: unexpected keyword arguments
      
!     >>> marry(a, b, c, d)
      [(1, 5, 9, 12), (2, 6, 10, 13)]
  
!     >>> marry(a, b, c, d, pad=None)
      [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
      >>> map(None, a, b, c, d)
--- 150,172 ----
      >>> d = (12, 13)
  
!     >>> zip(a, b)
      [(1, 5), (2, 6), (3, 7), (4, 8)]
  
!     >>> zip(a, d)
      [(1, 12), (2, 13)]
  
!     >>> zip(a, d, pad=0)
      [(1, 12), (2, 13), (3, 0), (4, 0)]
      
!     >>> zip(a, d, pid=0)
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
!       File "/usr/tmp/python-iKAOxR", line 11, in zip
      TypeError: unexpected keyword arguments
      
!     >>> zip(a, b, c, d)
      [(1, 5, 9, 12), (2, 6, 10, 13)]
  
!     >>> zip(a, b, c, d, pad=None)
      [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
      >>> map(None, a, b, c, d)
***************
*** 176,185 ****
  Reference Implementation
  
!     Here is a reference implementation, in Python of the marry()
      built-in function and helper class.  These would ultimately be
      replaced by equivalent C code.
  
!     class _Marriage:
          def __init__(self, args, kws):
              self.__padgiven = 0
              if kws.has_key('pad'):
--- 177,187 ----
  Reference Implementation
  
!     Here is a reference implementation, in Python of the zip()
      built-in function and helper class.  These would ultimately be
      replaced by equivalent C code.
  
!     class _Zipper:
          def __init__(self, args, kws):
+             # Defaults
              self.__padgiven = 0
              if kws.has_key('pad'):
***************
*** 187,190 ****
--- 189,193 ----
                  self.__pad = kws['pad']
                  del kws['pad']
+             # Assert no unknown arguments are left
              if kws:
                  raise TypeError('unexpected keyword arguments')
***************
*** 207,210 ****
--- 210,230 ----
              return tuple(ret)
  
+         def __len__(self):
+             # If we're padding, then len is the length of the longest sequence,
+             # otherwise it's the length of the shortest sequence.
+             if not self.__padgiven:
+                 shortest = -1
+                 for s in self.__sequences:
+                     slen = len(s)
+                     if shortest < 0 or slen < shortest:
+                         shortest = slen
+                 return shortest
+             longest = 0
+             for s in self.__sequences:
+                 slen = len(s)
+                 if slen > longest:
+                     longest = slen
+             return longest
+ 
          def __str__(self):
              ret = []
***************
*** 220,242 ****
  
  
!     def marry(*args, **kws):
!         return _Marriage(args, kws)
  
  
  
  Open Issues
  
!     What should "marry(a)" do?
  
!     Given a = (1, 2, 3), should marry(a) return [(1,), (2,), (3,)] or
!     should it return [1, 2, 3]?  The first is more consistent with the
!     description given above, while the latter is what map(None, a)
!     does, and may be more consistent with user expectation.
  
!     The latter interpretation requires special casing, which is not
!     present in the reference implementation.  It returns
  
!     >>> marry(a)
!     [(1,), (2,), (3,), (4,)]
  
  
--- 240,367 ----
  
  
!     def zip(*args, **kws):
!         return _Zipper(args, kws)
  
  
  
+ Rejected Elaborations
+ 
+     Some people have suggested that the user be able to specify the
+     type of the inner and outer containers for the zipped sequence.
+     This would be specified by additional keyword arguments to zip(),
+     named `inner' and `outer'.
+ 
+     This elaboration is rejected for several reasons.  First, there
+     really is no outer container, even though there appears to be an
+     outer list container the example above.  This is simply an
+     artifact of the repr() of the zipped object.  User code can do its
+     own looping over the zipped object via __getitem__(), and build
+     any type of outer container for the fully evaluated, concrete
+     sequence.  For example, to build a zipped object with lists as an
+     outer container, use
+ 
+         >>> list(zip(sequence_a, sequence_b, sequence_c))
+ 
+     for tuple outer container, use
+     
+         >>> tuple(zip(sequence_a, sequence_b, sequence_c))
+ 
+     This type of construction will usually not be necessary though,
+     since it is expected that zipped objects will most often appear in
+     for-loops.
+ 
+     Second, allowing the user to specify the inner container
+     introduces needless complexity and arbitrary decisions.  You might
+     imagine that instead of the default tuple inner container, the
+     user could prefer a list, or a dictionary, or instances of some
+     sequence-like class.
+ 
+     One problem is the API.  Should the argument to `inner' be a type
+     or a template object?  For flexibility, the argument should
+     probably be a type object (i.e. TupleType, ListType, DictType), or
+     a class.  For classes, the implementation could just pass the zip
+     element to the constructor.  But what about built-in types that
+     don't have constructors?  They would have to be special-cased in
+     the implementation (i.e. what is the constructor for TupleType?
+     The tuple() built-in).
+ 
+     Another problem that arises is for zips greater than length two.
+     Say you had three sequences and you wanted the inner type to be a
+     dictionary.  What would the semantics of the following be?
+ 
+         >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
+ 
+     Would the key be (element_a, element_b) and the value be
+     element_c, or would the key be element_a and the value be
+     (element_b, element_c)?  Or should an exception be thrown?
+ 
+     This suggests that the specification of the inner container type
+     is needless complexity.  It isn't likely that the inner container
+     will need to be specified very often, and it is easy to roll your
+     own should you need it.  Tuples are chosen for the inner container
+     type due to their (slight) memory footprint and performance
+     advantages.
+ 
+ 
+ 
  Open Issues
+ 
+     - What should "zip(a)" do?  Given
+ 
+       a = (1, 2, 3); zip(a)
+ 
+       three outcomes are possible.
+ 
+       1) Returns [(1,), (2,), (3,)]
+ 
+          Pros: no special casing in the implementation or in user
+          code, and is more consistent with the description of it's
+          semantics.  Cons: this isn't what map(None, a) would return,
+          and may be counter to user expectations.
  
!       2) Returns [1, 2, 3]
  
!          Pros: consistency with map(None, a), and simpler code for
!          for-loops, e.g.
  
!          for i in zip(a):
! 
!          instead of
! 
!          for (i,) in zip(a):
! 
!          Cons: too much complexity and special casing for what should
!          be a relatively rare usage pattern.
! 
!       3) Raises TypeError
! 
!          Pros: None
! 
!          Cons: needless restriction
! 
!       Current scoring seems to generally favor outcome 1.
! 
!     - The name of the built-in `zip' may cause some initial confusion
!       with the zip compression algorithm.  Other suggestions include
!       (but are not limited to!): marry, weave, parallel, lace, braid,
!       interlace, permute, furl, tuples, lists, stitch, collate, knit,
!       plait, and with.  All have disadvantages, and there is no clear
!       unanimous choice, therefore the decision was made to go with
!       `zip' because the same functionality is available in other
!       languages (e.g. Haskell) under the name `zip'[2].
! 
! 
! 
! References
! 
!     [1] http://www.python.org/doc/devel/ref/for.html    
!     [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
! 
!     TBD: URL to python-dev archives
! 
! 
! Copyright
  
!     This document has been placed in the public domain.