|Title:||Coroutines via Enhanced Generators|
|Author:||Guido van Rossum, Phillip J. Eby|
- Specification Summary
- Specification: Sending Values into Generators
- Specification: Exceptions and Cleanup
- Optional Extensions
- Open Issues
- Reference Implementation
This PEP proposes some enhancements to the API and syntax of generators, to make them usable as simple coroutines. It is basically a combination of ideas from these two PEPs, which may be considered redundant if this PEP is accepted:
- PEP 288, Generators Attributes and Exceptions. The current PEP covers its second half, generator exceptions (in fact the throw() method name was taken from PEP 288). PEP 342 replaces generator attributes, however, with a concept from an earlier revision of PEP 288, the yield expression.
- PEP 325, Resource-Release Support for Generators. PEP 342 ties up a few loose ends in the PEP 325 spec, to make it suitable for actual implementation.
Coroutines are a natural way of expressing many algorithms, such as simulations, games, asynchronous I/O, and other forms of event-driven programming or co-operative multitasking. Python's generator functions are almost coroutines -- but not quite -- in that they allow pausing execution to produce a value, but do not provide for values or exceptions to be passed in when execution resumes. They also do not allow execution to be paused within the try portion of try/finally blocks, and therefore make it difficult for an aborted coroutine to clean up after itself.
Also, generators cannot yield control while other functions are executing, unless those functions are themselves expressed as generators, and the outer generator is written to yield in response to values yielded by the inner generator. This complicates the implementation of even relatively simple use cases like asynchronous communications, because calling any functions either requires the generator to block (i.e. be unable to yield control), or else a lot of boilerplate looping code must be added around every needed function call.
However, if it were possible to pass values or exceptions into a generator at the point where it was suspended, a simple co-routine scheduler or trampoline function would let coroutines call each other without blocking -- a tremendous boon for asynchronous applications. Such applications could then write co-routines to do non-blocking socket I/O by yielding control to an I/O scheduler until data has been sent or becomes available. Meanwhile, code that performs the I/O would simply do something like this:
data = (yield nonblocking_read(my_socket, nbytes))
in order to pause execution until the nonblocking_read() coroutine produced a value.
In other words, with a few relatively minor enhancements to the language and to the implementation of the generator-iterator type, Python will be able to support performing asynchronous operations without needing to write the entire application as a series of callbacks, and without requiring the use of resource-intensive threads for programs that need hundreds or even thousands of co-operatively multitasking pseudothreads. Thus, these enhancements will give standard Python many of the benefits of the Stackless Python fork, without requiring any significant modification to the CPython core or its APIs. In addition, these enhancements should be readily implementable by any Python implementation (such as Jython) that already supports generators.
By adding a few simple methods to the generator-iterator type, and with two minor syntax adjustments, Python developers will be able to use generator functions to implement co-routines and other forms of co-operative multitasking. These methods and adjustments are:
- Redefine yield to be an expression, rather than a statement. The current yield statement would become a yield expression whose value is thrown away. A yield expression's value is None whenever the generator is resumed by a normal next() call.
- Add a new send() method for generator-iterators, which resumes the generator and sends a value that becomes the result of the current yield-expression. The send() method returns the next value yielded by the generator, or raises StopIteration if the generator exits without yielding another value.
- Add a new throw() method for generator-iterators, which raises an exception at the point where the generator was paused, and which returns the next value yielded by the generator, raising StopIteration if the generator exits without yielding another value. (If the generator does not catch the passed-in exception, or raises a different exception, then that exception propagates to the caller.)
- Add a close() method for generator-iterators, which raises GeneratorExit at the point where the generator was paused. If the generator then raises StopIteration (by exiting normally, or due to already being closed) or GeneratorExit (by not catching the exception), close() returns to its caller. If the generator yields a value, a RuntimeError is raised. If the generator raises any other exception, it is propagated to the caller. close() does nothing if the generator has already exited due to an exception or normal exit.
- Add support to ensure that close() is called when a generator iterator is garbage-collected.
- Allow yield to be used in try/finally blocks, since garbage collection or an explicit close() call would now allow the finally clause to execute.
A prototype patch implementing all of these changes against the current Python CVS HEAD is available as SourceForge patch #1223381 (http://python.org/sf/1223381).
A new method for generator-iterators is proposed, called send(). It takes exactly one argument, which is the value that should be sent in to the generator. Calling send(None) is exactly equivalent to calling a generator's next() method. Calling send() with any other value is the same, except that the value produced by the generator's current yield expression will be different.
Because generator-iterators begin execution at the top of the generator's function body, there is no yield expression to receive a value when the generator has just been created. Therefore, calling send() with a non-None argument is prohibited when the generator iterator has just started, and a TypeError is raised if this occurs (presumably due to a logic error of some kind). Thus, before you can communicate with a coroutine you must first call next() or send(None) to advance its execution to the first yield expression.
As with the next() method, the send() method returns the next value yielded by the generator-iterator, or raises StopIteration if the generator exits normally, or has already exited. If the generator raises an uncaught exception, it is propagated to send()'s caller.
The yield-statement will be allowed to be used on the right-hand side of an assignment; in that case it is referred to as yield-expression. The value of this yield-expression is None unless send() was called with a non-None argument; see below.
A yield-expression must always be parenthesized except when it occurs at the top-level expression on the right-hand side of an assignment. So
x = yield 42 x = yield x = 12 + (yield 42) x = 12 + (yield) foo(yield 42) foo(yield)
are all legal, but
x = 12 + yield 42 x = 12 + yield foo(yield 42, 12) foo(yield, 12)
are all illegal. (Some of the edge cases are motivated by the current legality of yield 12, 42.)
Note that a yield-statement or yield-expression without an expression is now legal. This makes sense: when the information flow in the next() call is reversed, it should be possible to yield without passing an explicit value (yield is of course equivalent to yield None).
When send(value) is called, the yield-expression that it resumes will return the passed-in value. When next() is called, the resumed yield-expression will return None. If the yield-expression is a yield-statement, this returned value is ignored, similar to ignoring the value returned by a function call used as a statement.
In effect, a yield-expression is like an inverted function call; the argument to yield is in fact returned (yielded) from the currently executing function, and the return value of yield is the argument passed in via send().
Note: the syntactic extensions to yield make its use very similar to that in Ruby. This is intentional. Do note that in Python the block passes a value to the generator using send(EXPR) rather than return EXPR, and the underlying mechanism whereby control is passed between the generator and the block is completely different. Blocks in Python are not compiled into thunks; rather, yield suspends execution of the generator's frame. Some edge cases work differently; in Python, you cannot save the block for later use, and you cannot test whether there is a block or not. (XXX - this stuff about blocks seems out of place now, perhaps Guido can edit to clarify.)
Let a generator object be the iterator produced by calling a generator function. Below, g always refers to a generator object.
The syntax for generator functions is extended to allow a yield-statement inside a try-finally statement.
g.throw(type, value, traceback) causes the specified exception to be thrown at the point where the generator g is currently suspended (i.e. at a yield-statement, or at the start of its function body if next() has not been called yet). If the generator catches the exception and yields another value, that is the return value of g.throw(). If it doesn't catch the exception, the throw() appears to raise the same exception passed it (it falls through). If the generator raises another exception (this includes the StopIteration produced when it returns) that exception is raised by the throw() call. In summary, throw() behaves like next() or send(), except it raises an exception at the suspension point. If the generator is already in the closed state, throw() just raises the exception it was passed without executing any of the generator's code.
The effect of raising the exception is exactly as if the statement:
raise type, value, traceback
was executed at the suspension point. The type argument must not be None, and the type and value must be compatible. If the value is not an instance of the type, a new exception instance is created using the value, following the same rules that the raise statement uses to create an exception instance. The traceback, if supplied, must be a valid Python traceback object, or a TypeError occurs.
Note: The name of the throw() method was selected for several reasons. Raise is a keyword and so cannot be used as a method name. Unlike raise (which immediately raises an exception from the current execution point), throw() first resumes the generator, and only then raises the exception. The word throw is suggestive of putting the exception in another location, and is already associated with exceptions in other languages.
Alternative method names were considered: resolve(), signal(), genraise(), raiseinto(), and flush(). None of these seem to fit as well as throw().
A new standard exception is defined, GeneratorExit, inheriting from Exception. A generator should handle this by re-raising it (or just not catching it) or by raising StopIteration.
g.close() is defined by the following pseudo-code:
def close(self): try: self.throw(GeneratorExit) except (GeneratorExit, StopIteration): pass else: raise RuntimeError("generator ignored GeneratorExit") # Other exceptions are not caught
g.__del__() is a wrapper for g.close(). This will be called when the generator object is garbage-collected (in CPython, this is when its reference count goes to zero). If close() raises an exception, a traceback for the exception is printed to sys.stderr and further ignored; it is not propagated back to the place that triggered the garbage collection. This is consistent with the handling of exceptions in __del__() methods on class instances.
If the generator object participates in a cycle, g.__del__() may not be called. This is the behavior of CPython's current garbage collector. The reason for the restriction is that the GC code needs to break a cycle at an arbitrary point in order to collect it, and from then on no Python code should be allowed to see the objects that formed the cycle, as they may be in an invalid state. Objects hanging off a cycle are not subject to this restriction.
Note that it is unlikely to see a generator object participate in a cycle in practice. However, storing a generator object in a global variable creates a cycle via the generator frame's f_globals pointer. Another way to create a cycle would be to store a reference to the generator object in a data structure that is passed to the generator as an argument (e.g., if an object has a method that's a generator, and keeps a reference to a running iterator created by that method). Neither of these cases are very likely given the typical patterns of generator use.
Also, in the CPython implementation of this PEP, the frame object used by the generator should be released whenever its execution is terminated due to an error or normal exit. This will ensure that generators that cannot be resumed do not remain part of an uncollectable reference cycle. This allows other code to potentially use close() in a try/finally or with block (per PEP 343) to ensure that a given generator is properly finalized.
An earlier draft of this PEP proposed a new continue EXPR syntax for use in for-loops (carried over from PEP 340), that would pass the value of EXPR into the iterator being looped over. This feature has been withdrawn for the time being, because the scope of this PEP has been narrowed to focus only on passing values into generator-iterators, and not other kinds of iterators. It was also felt by some on the Python-Dev list that adding new syntax for this particular feature would be premature at best.
Discussion on python-dev has revealed some open issues. I list them here, with my preferred resolution and its motivation. The PEP as currently written reflects this preferred resolution.
What exception should be raised by close() when the generator yields another value as a response to the GeneratorExit exception?
I originally chose TypeError because it represents gross misbehavior of the generator function, which should be fixed by changing the code. But the with_template decorator class in PEP 343 uses RuntimeError for similar offenses. Arguably they should all use the same exception. I'd rather not introduce a new exception class just for this purpose, since it's not an exception that I want people to catch: I want it to turn into a traceback which is seen by the programmer who then fixes the code. So now I believe they should both raise RuntimeError. There are some precedents for that: it's raised by the core Python code in situations where endless recursion is detected, and for uninitialized objects (and for a variety of miscellaneous conditions).
Oren Tirosh has proposed renaming the send() method to feed(), for compatibility with the consumer interface (see http://effbot.org/zone/consumer.htm for the specification.)
However, looking more closely at the consumer interface, it seems that the desired semantics for feed() are different than for send(), because send() can't be meaningfully called on a just-started generator. Also, the consumer interface as currently defined doesn't include handling for StopIteration.
Therefore, it seems like it would probably be more useful to create a simple decorator that wraps a generator function to make it conform to the consumer interface. For example, it could warm up the generator with an initial next() call, trap StopIteration, and perhaps even provide reset() by re-invoking the generator function.
A simple consumer decorator that makes a generator function automatically advance to its first yield point when initially called:
def consumer(func): def wrapper(*args,**kw): gen = func(*args, **kw) gen.next() return gen wrapper.__name__ = func.__name__ wrapper.__dict__ = func.__dict__ wrapper.__doc__ = func.__doc__ return wrapper
An example of using the consumer decorator to create a reverse generator that receives images and creates thumbnail pages, sending them on to another consumer. Functions like this can be chained together to form efficient processing pipelines of consumers that each can have complex internal state:
@consumer def thumbnail_pager(pagesize, thumbsize, destination): while True: page = new_image(pagesize) rows, columns = pagesize / thumbsize pending = False try: for row in xrange(rows): for column in xrange(columns): thumb = create_thumbnail((yield), thumbsize) page.write( thumb, col*thumbsize.x, row*thumbsize.y ) pending = True except GeneratorExit: # close() was called, so flush any pending output if pending: destination.send(page) # then close the downstream consumer, and exit destination.close() return else: # we finished a page full of thumbnails, so send it # downstream and keep on looping destination.send(page) @consumer def jpeg_writer(dirname): fileno = 1 while True: filename = os.path.join(dirname,"page%04d.jpg" % fileno) write_jpeg((yield), filename) fileno += 1 # Put them together to make a function that makes thumbnail # pages from a list of images and other parameters. # def write_thumbnails(pagesize, thumbsize, images, output_dir): pipeline = thumbnail_pager( pagesize, thumbsize, jpeg_writer(output_dir) ) for image in images: pipeline.send(image) pipeline.close()
A simple co-routine scheduler or trampoline that lets coroutines call other coroutines by yielding the coroutine they wish to invoke. Any non-generator value yielded by a coroutine is returned to the coroutine that called the one yielding the value. Similarly, if a coroutine raises an exception, the exception is propagated to its caller. In effect, this example emulates simple tasklets as are used in Stackless Python, as long as you use a yield expression to invoke routines that would otherwise block. This is only a very simple example, and far more sophisticated schedulers are possible. (For example, the existing GTasklet framework for Python (http://www.gnome.org/~gjc/gtasklet/gtasklets.html) and the peak.events framework (http://peak.telecommunity.com/) already implement similar scheduling capabilities, but must currently use awkward workarounds for the inability to pass values or exceptions into generators.)
import collections class Trampoline: """Manage communications between coroutines""" running = False def __init__(self): self.queue = collections.deque() def add(self, coroutine): """Request that a coroutine be executed""" self.schedule(coroutine) def run(self): result = None self.running = True try: while self.running and self.queue: func = self.queue.popleft() result = func() return result finally: self.running = False def stop(self): self.running = False def schedule(self, coroutine, stack=(), val=None, *exc): def resume(): value = val try: if exc: value = coroutine.throw(value,*exc) else: value = coroutine.send(value) except: if stack: # send the error back to the "caller" self.schedule( stack, stack, *sys.exc_info() ) else: # Nothing left in this pseudothread to # handle it, let it propagate to the # run loop raise if isinstance(value, types.GeneratorType): # Yielded to a specific coroutine, push the # current one on the stack, and call the new # one with no args self.schedule(value, (coroutine,stack)) elif stack: # Yielded a result, pop the stack and send the # value to the caller self.schedule(stack, stack, value) # else: this pseudothread has ended self.queue.append(resume)
A simple echo server, and code to run it using a trampoline (presumes the existence of nonblocking_read, nonblocking_write, and other I/O coroutines, that e.g. raise ConnectionLost if the connection is closed):
# coroutine function that echos data back on a connected # socket # def echo_handler(sock): while True: try: data = yield nonblocking_read(sock) yield nonblocking_write(sock, data) except ConnectionLost: pass # exit normally if connection lost # coroutine function that listens for connections on a # socket, and then launches a service "handler" coroutine # to service the connection # def listen_on(trampoline, sock, handler): while True: # get the next incoming connection connected_socket = yield nonblocking_accept(sock) # start another coroutine to handle the connection trampoline.add( handler(connected_socket) ) # Create a scheduler to manage all our coroutines t = Trampoline() # Create a coroutine instance to run the echo_handler on # incoming connections # server = listen_on( t, listening_socket("localhost","echo"), echo_handler ) # Add the coroutine to the scheduler t.add(server) # loop forever, accepting connections and servicing them # "in parallel" # t.run()
A prototype patch implementing all of the features described in this PEP is available as SourceForge patch #1223381 (http://python.org/sf/1223381).
This patch was committed to CVS 01-02 August 2005.
Raymond Hettinger (PEP 288) and Samuele Pedroni (PEP 325) first formally proposed the ideas of communicating values or exceptions into generators, and the ability to close generators. Timothy Delaney suggested the title of this PEP, and Steven Bethard helped edit a previous version. See also the Acknowledgements section of PEP 340.
This document has been placed in the public domain.