[Python-checkins] CVS: python/nondist/peps pep-0255.txt,NONE,1.1

Barry Warsaw bwarsaw@users.sourceforge.net
Tue, 05 Jun 2001 10:11:32 -0700


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv22547

Added Files:
	pep-0255.txt 
Log Message:
Magnus Lie Hetland's first version of this pep.


--- NEW FILE: pep-0255.txt ---
PEP: 255
Title: Simple Generators
Version: $Revision: 1.1 $
Author: nas@python.ca (Neil Schemenauer),
    tim.one@home.com (Tim Peters),
    magnus@hetland.org (Magnus Lie Hetland)
Discussion-To: python-iterators@lists.sourceforge.net
Status: Draft
Type: Standards Track
Requires: 234
Created: 18-May-2001
Python-Version: 2.2
Post-History:


Abstract

    This PEP introduces the concept of generators to Python, as well
    as a new statement used in conjunction with them, the "yield"
    statement.


Motivation

    When a producer function has a hard enough job that it requires
    maintaining state between values produced, most programming languages
    offer no pleasant and efficient solution beyond adding a callback
    function to the producer's argument list, to be called with each value
    produced.

    For example, tokenize.py in the standard library takes this approach:
    the caller must pass a "tokeneater" function to tokenize(), called
    whenever tokenize() finds the next token.  This allows tokenize to be
    coded in a natural way, but programs calling tokenize are typically
    convoluted by the need to remember between callbacks which token(s)
    were seen last.  The tokeneater function in tabnanny.py is a good
    example of that, maintaining a state machine in global variables, to
    remember across callbacks what it has already seen and what it hopes to
    see next.  This was difficult to get working correctly, and is still
    difficult for people to understand.  Unfortunately, that's typical of
    this approach.

    An alternative would have been for tokenize to produce an entire parse
    of the Python program at once, in a large list.  Then tokenize clients
    could be written in a natural way, using local variables and local
    control flow (such as loops and nested if statements) to keep track of
    their state.  But this isn't practical:  programs can be very large, so
    no a priori bound can be placed on the memory needed to materialize the
    whole parse; and some tokenize clients only want to see whether
    something specific appears early in the program (e.g., a future
    statement, or, as is done in IDLE, just the first indented statement),
    and then parsing the whole program first is a severe waste of time.

    Another alternative would be to make tokenize an iterator[1],
    delivering the next token whenever its .next() method is invoked.  This
    is pleasant for the caller in the same way a large list of results
    would be, but without the memory and "what if I want to get out early?"
    drawbacks.  However, this shifts the burden on tokenize to remember
    *its* state between .next() invocations, and the reader need only
    glance at tokenize.tokenize_loop() to realize what a horrid chore that
    would be.  Or picture a recursive algorithm for producing the nodes of
    a general tree structure:  to cast that into an iterator framework
    requires removing the recursion manually and maintaining the state of
    the traversal by hand.

    A fourth option is to run the producer and consumer in separate
    threads.  This allows both to maintain their states in natural ways,
    and so is pleasant for both.  Indeed, Demo/threads/Generator.py in the
    Python source distribution provides a usable synchronized-communication
    class for doing that in a general way.  This doesn't work on platforms
    without threads, though, and is very slow on platforms that do
    (compared to what is achievable without threads).

    A final option is to use the Stackless[2][3] variant implementation of
    Python instead, which supports lightweight coroutines.  This has much
    the same programmatic benefits as the thread option, but is much more
    efficient.  However, Stackless is a radical and controversial
    rethinking of the Python core, and it may not be possible for Jython to
    implement the same semantics.  This PEP isn't the place to debate that,
    so suffice it to say here that generators provide a useful subset of
    Stackless functionality in a way that fits easily into the current
    Python implementation.

    That exhausts the current alternatives.  Some other high-level
    languages provide pleasant solutions, notably iterators in Sather[4],
    which were inspired by iterators in CLU; and generators in Icon[5], a
    novel language where every expression "is a generator".  There are
    differences among these, but the basic idea is the same:  provide a
    kind of function that can return an intermediate result ("the next
    value") to its caller, but maintaining the function's local state so
    that the function can be resumed again right where it left off.  A
    very simple example:

        def fib(): a, b = 0, 1
            while 1:
                yield b
                a, b = b, a+b

    When fib() is first invoked, it sets a to 0 and b to 1, then yields b
    back to its caller.  The caller sees 1.  When fib is resumed, from its
    point of view the yield statement is really the same as, say, a print
    statement:  fib continues after the yield with all local state intact.
    a and b then become 1 and 1, and fib loops back to the yield, yielding
    1 to its invoker.  And so on.  From fib's point of view it's just
    delivering a sequence of results, as if via callback.  But from its
    caller's point of view, the fib invocation is an iterable object that
    can be resumed at will.  As in the thread approach, this allows both
    sides to be coded in the most natural ways; but unlike the thread
    approach, this can be done efficiently and on all platforms.  Indeed,
    resuming a generator should be no more expensive than a function call.

    The same kind of approach applies to many producer/consumer functions.
    For example, tokenize.py could yield the next token instead of invoking
    a callback function with it as argument, and tokenize clients could
    iterate over the tokens in a natural way:  a Python generator is a kind
    of Python iterator[1], but of an especially powerful kind.


Specification

    A new statement, the "yield" statement, is introduced:

        yield_stmt:    "yield" [expression_list]

    This statement may only be used inside functions. A function which
    contains a yield statement is a so-called "generator function". A
    generator function may not contain return statements of the form:

        "return" expression_list

    It may, however, contain return statements of the form:

        "return"

    When a generator function is called, an iterator[6] is returned.
    Each time the .next() method of this iterator is called, the code
    in the body of the generator function is executed until a yield
    statement or a return statement is encountered, or until the end
    of the body is reached.

    If a yield statement is encountered during this execution, the
    state of the function is frozen, and a value is returned to the
    object calling .next(). If an empty yield statement was
    encountered, None is returned; otherwise, the given expression(s)
    is (are) returned.

    If an empty return statement is encountered, nothing is returned;
    however, a StopIteration exception is raised, signalling that the
    iterator is exhausted.

    An example of how generators may be used is given below:

        # A binary tree class
        class Tree:
            def __init__(self, label, left=None, right=None):
                self.label = label
                self.left = left
                self.right = right
            def __repr__(self, level=0, indent="    "):  
                s = level*indent + `self.label`
                if self.left:
                    s = s + "\n" + self.left.__repr__(level+1, indent)
                if self.right:
                    s = s + "\n" + self.right.__repr__(level+1, indent)
                return s
            def __iter__(self):
                return inorder(self)
        
        # A function that creates a tree from a list
        def tree(list):
            if not len(list):
                return []
            i = len(list)/2
            return Tree(list[i], tree(list[:i]), tree(list[i+1:]))
        
        # A recursive generator that generates the tree leaves in in-order
        def inorder(t):
            if t:
                for x in inorder(t.left): yield x
                yield t.label
                for x in inorder(t.right): yield x
        
        # Show it off: create a tree
        t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
        # Print the nodes of the tree in in-order
        for x in t: print x,
        print
        
        # A non-recursive generator.
        def inorder(node):
                stack = []
                while node:
                        while node.left:
                                stack.append(node)
                                node = node.left
                        yield node.label
                        while not node.right:
                            try:
                                node = stack.pop()
                            except IndexError:
                                return
                            yield node.label
                        node = node.right
    
        # Exercise the non-recursive generator
        for x in t: print x,
        print


Reference Implementation

    A preliminary patch against the CVS Python source is available[7].


Footnotes and References

    [1] PEP 234, http://python.sourceforge.net/peps/pep-0234.html
    [2] http://www.stackless.com/
    [3] PEP 219, http://python.sourceforge.net/peps/pep-0219.html
    [4] "Iteration Abstraction in Sather"
        Murer , Omohundro, Stoutamire and Szyperski
        http://www.icsi.berkeley.edu/~sather/Publications/toplas.html
    [5] http://www.cs.arizona.edu/icon/
    [6] The concept of iterators is described in PEP 234
        http://python.sourceforge.net/peps/pep-0234.html
    [7] http://python.ca/nas/python/generator.diff


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: