[Python-Dev] Generator details

Guido van Rossum guido@CNRI.Reston.VA.US
Thu, 08 Jul 1999 01:08:44 -0400


I have a few questions/suggestions about generators.

Tim writes that a suspended generator has exactly one stack frame.
I'm not sure I like that.  The Demo/thread/Generator.py version has no 
such restriction; anything that has a reference to the generator can
put() the next value.  Is the restriction really necessary?  I can see 
a good use for a recursive generator, e.g. one that generates a tree
traversal:

    def inorder(node):
        if node.left: inorder(node.left)
        suspend node
        if node.right: inorder(node.right)

If I understand Tim, this could not work because there's more than one
stack frame involved.  On the other hand, he seems to suggest that
something like this *is* allowed when using "modern" coroutines.  Am I
missing something?  I though that tree traversal was one of Tim's
first examples of generators; would I really have to use an explicit
stack to create the traversal?

Next, I want more clarity about the initialization and termination
conditions.

The Demo/thread/Generator.py version is very explicit about
initialization: you instantiate the Generator class, passing it a
function that takes a Generator instance as an argument; the function
executes in a new thread.  (I guess I would've used a different
interface now -- perhaps inheriting from the Generator class
overriding a run() method.)  For termination, the normal way to stop
seems to be for the generator function to return (rather than calling
g.put()), the consumer then gets an EOFError exception the next time
it calls g.get().  There's also a way for either side to call g.kill()
to stop the generator prematurely.

Let me try to translate that to a threadless implementation.  We could 
declare a simple generator as follows:

    generator reverse(seq):
        i = len(seq)
        while i > 0:
            i = i-1
            suspend seq[i]

This could be translated by the Python translator into the following,
assuming a system class generator which provides the machinery for
generators:

    class reverse(generator):
        def run(self, seq):
            i = len(seq)
            while i > 0:
                i = i-1
                self.suspend(seq[i])

(Perhaps the identifiers generator, run and suspend would be spelled
with __...__, but that's just clutter for now.)

Now where Tim was writing examples like this:

    for c in reverse("Hello world"):
        print c,
    print

I'd like to guess what the underlying machinery would look like.  For
argument's sake, let's assume the for loop recognizes that it's using
a generator (or better, it always needs a generator, and when it's not
a generator it silently implies a sequence-iterating generator).
So the translator could generate the following:

    g = reverse("Hello world") # instantiate class reverse
    while 1:
        try:
            c = g.resume()
        except EOGError: # End Of Generator
            break
        print c,
    print

(Where g should really be a unique temporary local variable.)

In this model, the g.resume() and g.suspend() calls have all the magic.
They should not be accessible to the user.  They are written in C so
they can play games with frame objects.

I guess that the *first* call to g.resume(), for a particular
generator instance, should start the generator's run() method; run()
is not activated by the instantiation of the generator.  Then run()
runs until the first suspend() call, which causes the return from the
resume() call to happen.  Subsequent resume() calls know that there's
already is a frame (it's stored in the generator instance) and simply
continue its execution where it was.  If the run() method returns from
the frame, the resume() call is made to raise EOGError (blah, bogus
name) which signals the end of the loop.  (The user may write this
code explicitly if they want to consume the generated elements in a
different way than through a for loop.)

Looking at this machinery, I think the recursive generator that I
wanted could be made to work, by explicitly declaring a generator
subclass (instead of using the generator keyword, which is just
syntactic sugar) and making calls to methods of self, e.g.:

    class inorder(generator):
        def run(self, node):
            if node.left: self.run(node.left)
            self.suspend(node)
            if node.right: self.run(node.right)

The generator machinery would (ab)use the fact that Python frames
don't necessarily have to be linked in a strict stack order; the
generator gets a pointer to the frame to resume from resume(), and
there's a "bottom" frame which, when hit, raises the EOGError
exception.  All currently active frames belonging to the generator
stay alive while another resume() is possible.

All this is possible by the introduction of an explicit generator
object.  I think Tim had an implementation in mind where the standard
return pointer in the frame is the only thing necessary; actually, I
think the return pointer is stored in the calling frame, not in the
called frame (Christian?  Is this so in your version?).  That
shouldn't make a difference, except that it's not clear to me how to
reference the frame (in the explicitly coded version, which has to
exist at least at the bytecode level).

With classic coroutines, I believe that there's no difference between
the first call and subsequent calls to the coroutine.  This works in
the Knuth world where coroutines and recursion don't go together; but
at least for generators I would hope that it's possible for multiple
instances of the same generator to be active simultaneously (e.g. I
could be reversing over a list of files and then reverse each of the
lines in the file; this uses separate instances of the reverse()
generator).  So we need a way to reference the generator instance
separately from the generator constructor.  The machinery I sketched
above solves this.

After Tim has refined or rebutted this, I think I'll be able to
suggest what to do for coroutines.

(I'm still baffled by continuations.  The question whether the for
saved and restored loop should find itself in the 1st or 5th iteration 
surprises me.  Doesn't this cleanly map into some Scheme code that
tells us what to do?  Or is it unclear because Scheme does all loops
through recursion?  I presume that if you save the continuation of the 
1st iteration and restore it in the 5th, you'd find yourself in the
back 1st iteration?  But this is another thread.)

--Guido van Rossum (home page: http://www.python.org/~guido/)