[Python-Dev] uthread strawman

Tue, 07 Nov 2000 11:14:21 -0500

I've thought about it a little more, and drawn some pictures in my
head.

I still have to disagree with Christian when he says:

> Making Python completely coroutine aware, without
> tricking the C stack, is 90 percent of the problem.
> But after walking that far, there is no reason
> to leave the other 10 percent alone.

Without continuations, but with microthreads (uthreads) or coroutines,
each (Python) stack frame can simply be "paused" at a specific point
and continued later.  The semantics here are completely clear (except
perhaps for end cases such as unhandled exceptions and intervening C
stack frames).

With continuations, you have to decide how much state to save for a
future continuation.  It would seem easy enough: save all state kept
in the frame except for the local variables.  But now consider this:
the "evaluation stack" contained in each frame could easily be
replaced by a bunch of temporary variables, if we had a slightly
different instruction set (3-address opcodes instead of stack-based
opcodes).  Then would we save those temporary variables or not?  it
can make a difference!  Since the "save continuation" operation is a
function call, you can easily save a continuation while there are some
items on the value stack.  I believe the current implementation saves
these so they are restored when you jump to the continuation.  But if
the compiler were to use temporary variables instead of the evaluation
stack, they might not have been restored!

Here's another example.  Suppose you set up a for loop.  After three
iterations through the loop you save a continuation.  Then you finish
hree more iterations.  Then you return to the saved continuation.
Where does the loop continue: at 3 or at 6 iterations?  Try to answer
this without trying it.  My guess: it gets restarted at 3 iterations,
because the loop index is saved on the value stack.  If you rewrite
this using a while loop, however, it would get restarted at 6
iterations, because then your loop index is an unsaved local variable.
Ditto if you changed the bytecode compiler so for loops use an
anonymous local variable instead of an entry on the evaluation
stack.

This semantic imprecision is one of the reasons why I don't like the
concept of continuations.  (I've been told that the exact semantics of
continuations in Scheme differ greatly between Scheme implementations.)

Now let's look at Jython.  In Jython, we can simulate "paused frames"
well enough by using real Java threads.  However full continuations
would require modifications to the JVM -- which is unacceptable to a
language boasting "100% Pure Java".  Another reason against allowing
continuations.

So, all in all, I don't think of continuations as "the last 10% that
we might as well add to finish the job."  I see it as an ill-specified
hypergeneralization.

What *do* I want to see in a stackless PEP?

Not surprisingly, I'd like to see generators, coroutines, and
uthreads.  These all share a mechanism that pauses one frame and
resumes another.  I propose to make the concept of uthreads
fundamental -- this will simplify the emulation in Jython.

A strawman proposal:

The uthread module provides the new functionality at the lowest level.
Uthread objects represent microthreads.  An uthread has a chain of
stack frames linked by back pointers just like a regular thread.
Pause/resume operations are methods on uthread objects.  Pause/resume
operations do not address specific frames but specific uthreads;
within an uthread the normal call/return mechanisms can be used, and
only the top frame in the uthread's stack of call frames can be
paused/resumed (the ones below it are paused implicitly by the call to
the next frame, and resumed when that call returns).

- u = uthread.new(func) creates a new uthread object, u.  The new
  uthread is poised to call func() but doesn't execute yet.

- u = uthread.current() returns the uthread object for the current
  frame.

- u.yield() pauses the current uthread and resume the uthread u where
  it was paused.  The current uthread is resumed when some other
  uthread calls its yield() method.  Calling uthread.current().yield()
  is a no-op.

- When func() returns, the uthread that was executing it ceases to be
  runnable.  The uthread that most recently yielded to it is resumed,
  unless that is no longer runnable, in which case the uthread that
  most recently yielded to *it* is resumed, and so on until a runnable
  uthread is found or until no runnable uthreads are left, in which
  case the program terminates.  (XXX I need a proof here that this
  works.)

- When func() raises an unhandled exception, the exception gets
  propagated using the same rules as when it returns, and similarly
  its uthread ceases to be runnable.

- u.kill(exc) causes the yield() call that paused u to raise the
  exception exc.  (This can be caught in a try/except of course.)

- Calling u.yield() or u.kill() for a non-runnable uthread is an error
  and raises an exception.

I think this API should enough to implement Gordon's SelectDispatcher
code.  In general, it's easy to create a scheduler uthread that
schedules other uthreads.

Open issues:

- I'm not sure that I got the start conditions right.  Should func() be
  be allowed to run until its first yield() when uthread.new(func) is
  called?

- I'm not sure that the rules for returning and raising exceptions
  from func() are the right ones.

- Should it be possible to pass a value to another uthread by passing
  an argument to u.yield(), which then gets returned by the resumed
  yield() call in that uthread?

- How do uthreads interact with real threads?  Uthreads are explicitly
  scheduled through yield() calls; real threads use preemptive
  scheduling.  I suppose we could create a new "main" uthread for each
  real thread.  But what if we yield() to an uthread that's already
  executing in another thread?  How is that error detected?

Please help!

--Guido van Rossum (home page: http://www.python.org/~guido/)