[Python-ideas] Late to the async party (PEP 3156)

Sun Dec 16 20:11:48 CET 2012

On 12-12-16 11:27 AM, Guido van Rossum wrote:
> The PEP is definitely weak. Here are some thoughts/proposals though:
>
>   * You can't cancel a coroutine; however you can cancel a Task, which
>     is a Future wrapping a stack of coroutines linked via yield-from.
>

I'll just underline your statement that "you can't cancel a coroutine" 
here, since I'm referencing it later.

This distinction between "bare" coroutines, Futures, and Tasks is a bit 
foreign to me, since in Kaa all coroutines return (a subclass of) 
InProgress objects.

The Tasks section in the PEP says that a bare coroutine (is this the 
same as the previously defined "coroutine object"?) has much less 
overhead than a Task but it's not clear to me why that would be, as both 
would ultimately need to be managed by the scheduler, wouldn't they?

I could imagine that a coroutine object is implemented as a C object for 
performance, and a Task is a Python class, and maybe that explains the 
difference.  But then why differentiate between Future and Task 
(particularly because they have the same interface, so I can't draw an 
analogy with jQuery's Deferreds and Promises, where Promises are a 
restricted form of Deferreds for public consumption to attach callbacks).

>   * Cancellation only takes effect when a task is suspended.
>

Yes, this is intuitive.

>   * When you cancel a Task, the most deeply nested coroutine (the one
>     that caused it to be suspended) receives a special exception (I
>     propose to reuse concurrent.futures.CancelledError from PEP 3148).
>     If it doesn't catch this it bubbles all the way to the Task, and
>     then out from there.
>

So if the most deeply nested coroutine catches the CancelledError and 
doesn't reraise, it can prevent its cancellation?

I took a similar appoach, except that coroutines can't abort their own 
cancellation, and whether or not the nested coroutines actually get 
cancelled depends on whether something else was interested in their result.

Consider a coroutine chain where A yields B yields C yields D, and we do 
B.abort()

  * if only C was interested in D's result, then D will get an
    InProgressAborted raised inside it (at whatever point it's currently
    suspended).  If something other than C was also waiting on D, D will
    not be affected
  * similarly, if only B was interested in C's result, then C will get
    an InProgressAborted raised inside it (at yield D).
  * B will get InProgressAborted raised inside it (at yield C)
  * for B, C and D, the coroutines will not be reentered and they are
    not allowed to yield a value that suggests they expect reentry. 
    There's nothing a coroutine can do to prevent its own demise.
  * A will get an InProgressAborted raised inside it (at yield B)
  * In all the above cases, the InProgressAborted instance has an origin
    attribute that is B's InProgress object
  * Although B, C, and D are now aborted, A isn't aborted.  It's allowed
    to yield again.
  * with Kaa, coroutines are abortable by default (so they are like
    Tasks always).  But in this example, B can present C from being
    aborted by yielding C().noabort()

There are quite a few scenarios to consider: A yields B and B is 
cancelled or raises; A yields B and A is cancelled or raises; A yields 
B, C yields B, and A is cancelled or raises; A yields B, C yields B, and 
A or C is cancelled or raises; A yields par(B,C,D) and B is cancelled or 
raises; etc, etc.

In my experience, there's no one-size-fits-all behaviour, and the best 
we can do is have sensible default behaviour with some API (different 
functions, kwargs, etc.) to control the cancellation propagation logic.

>   * However when a coroutine in one Task uses yield-from to wait for
>     another Task, the latter does not automatically get cancelled. So
>     this is a difference between "yield from foo()" and "yield from
>     Task(foo())", which otherwise behave pretty similarly. Of course
>     the first Task could catch the exception and cancel the second
>     task -- that is its responsibility though and not the default
>     behavior.
>

Ok, so nested bare coroutines will get cancelled implicitly, but nested 
Tasks won't?

I'm having a bit of difficulty with this one.  You said that coroutines 
can't be cancelled, but Tasks can be.  But here, if they are being 
yielded, the opposite behaviour applies: yielded coroutines /are/ 
cancelled if a Task is cancelled, but yielded tasks /aren't/.

Or have I misunderstood?

>   * PEP 3156 has a par() helper which lets you block for multiple
>     tasks/coroutines in parallel. It takes arguments which are either
>     coroutines, Tasks, or other Futures; it wraps the coroutines in
>     Tasks to run them independently an just waits for the other
>     arguments. Proposal: when the Task containing the par() call is
>     cancelled, the par() call intercepts the cancellation and by
>     default cancels those coroutines that were passed in "bare" but
>     not the arguments that were passed in as Tasks or Futures. Some
>     keyword argument to par() may be used to change this behavior to
>     "cancel none" or "cancel all" (exact API spec TBD).
>

Here again, par() would cancel a bare coroutine but not Tasks.  It's 
consistent with your previous bullet but seems to contradict your first 
bullet that you can't cancel a coroutine.

I guess the distinction is you can't explicitly cancel a coroutine, but 
coroutines can be implicitly cancelled?

As I discussed previously, one of those tasks might be yielded by some 
other active coroutine, and so cancelling it may not be the right thing 
to do.  Being able to control this behaviour is important, whether 
that's a par() kwarg, or special method like noabort() that constructs 
an unabortable Task instance.

Kaa has similar constructs to allow yielding a collection of InProgress 
objects (whatever they might represent: coroutines, threaded functions, 
etc.).  In particular, it allows you to yield multiple tasks and resume 
when ALL of them complete (InProgressAll), or when ANY of them complete 
(InProgressAny).  For example:

     @kaa.coroutine()
     def is_any_host_up(*hosts):
         try:
             # ping() is a coroutine
             yield kaa.InProgressAny(ping(host) for host in hosts).timeout(5, abort=True)
         except kaa.TimeoutException:
             yield False
         else:
             yield True

More details here:

http://api.freevo.org/kaa-base/async/inprogress.html#inprogress-collections

 From what I understand of the proposed par() it would require//ALL of 
the supplied futures to complete, but there are many use-cases for the 
ANY variant as well.

> Interesting. In Tulip v1 (the experimental version I wrote before PEP 
> 3156) the Task() constructor has an optional timeout argument. It 
> works by scheduling a callback at the given time in the future, and 
> the callback simply cancel the task (which is a no-op if the task has 
> already completed). It works okay, except it generates tracebacks that 
> are sometimes logged and sometimes not properly caught -- though some 
> of that may be my messy test code. The exception raised by a timeout 
> is the same CancelledError, which is somewhat confusing. I wonder if 
> Task.cancel() shouldn't take an exception with which to cancel the 
> task with. (TimeoutError in PEP 3148 has a different role, it is when 
> the timeout on a specific wait expires, so e.g. fut.result(timeout=2) 
> waits up to 2 seconds for fut to complete, and if not, the call raises 
> TimeoutError, but the code running in the executor is unaffected.)

FWIW, the equivalent in Kaa which is InProgress.abort() does take an 
optional exception, which must subclass InProgressAborted.  If None, a 
new InProgressAborted is created.  InProgress.timeout(t) will start a 
timer that invokes InProgress.abort(TimeoutException()) 
(TimeoutException subclasses InProgressAborted).

It sounds like your proposed implementation works like:

    @tulip.coroutine()
    def foo():
       try:
          result = yield from Task(othercoroutine()).result(timeout=2)
       except TimeoutError:
          # ... othercoroutine() still lives on

I think Kaa's syntax is cleaner but it seems functionally the same:

    @kaa.coroutine()
    def foo():
       try:
          result = yield othercoroutine().timeout(2)
       except kaa.TimeoutException:
          # ... othercoroutine() still lives on

It's also possible to conveniently ensure that othercoroutine() is 
aborted if the timeout elapses:

    try:
       result = yield othercoroutine().timeout(2, abort=True)
    except kaa.TimeoutException:
       # ... othercoroutine() is aborted

> We've had long discussions about yield vs. yield-from. The latter is 
> way more efficient and that's enough for me to push it through. When 
> using yield, each yield causes you to bounce to the scheduler, which 
> has to do a lot of work to decide what to do next, even if that is 
> just resuming the suspended generator; and the scheduler is 
> responsible for keeping track of the stack of generators. When using 
> yield-from, calling another coroutine as a subroutine is almost free 
> and doesn't involve the scheduler at all; thus it's much cheaper, and 
> the scheduler can be simpler (doesn't need to keep track of the 
> stack). Also stack traces and debugging are better. 

But this sounds like a consequence of a particular implementation, isn't it?

A @kaa.coroutine() decorated function is entered right away when 
invoked, and the decorator logic does as much as it can until the 
underlying generator yields an unfinished InProgress that needs to wait 
for (or kaa.NotFinished).  Once it yields, /then/ the decorator sets up 
the necessary hooks with the scheduler / event loop.

This means you can nest a stack of coroutines without involving the 
scheduler until something truly asynchronous needs to take place.

Have I misunderstood?

>       * coroutines can have certain policies that control invocation
>         behaviour.  The most obvious ones to describe are
>         POLICY_SYNCHRONIZED which ensures that multiple invocations of
>         the same coroutine are serialized, and POLICY_SINGLETON which
>         effectively ignores subsequent invocations if it's already running
>       * it is possible to have a special progress object passed into
>         the coroutine function so that the coroutine's progress can be
>         communicated to an outside observer
>
>
> These seem pretty esoteric and can probably implemented in user code 
> if needed.

I'm fine with that, provided the flexibility is there to allow for it.

> As I said, I think wait_for_future() and run_in_executor() in the PEP 
> give you all you need. The @threaded decorator you propose is just 
> sugar; if a user wants to take an existing API and convert it from a 
> coroutine to threaded without requiring changes to the caller, they 
> can just introduce a helper that is run in a thread with 
> run_in_executor().

Also works for me. :)

> Thanks for your very useful contribution! Kaa looks like an 
> interesting system. Is it ported to Python 3 yet? Maybe you could look 
> into integrating with the PEP 3156 event loop and/or scheduler.

Kaa does work with Python 3, yes, although it still lacks very much 
needed unit tests so I'm not completely confident it has the same 
functional coverage as Python 2.

I'm definitely interested in having it conform to whatever shakes out of 
PEP 3156, which is why I'm speaking up now. :)

I've a couple other subjects I should bring up:

Tasks/Futures as "signals": it's often necessary to be able to resume a 
coroutine based on some condition other than e.g. any IO tasks it's 
waiting on.  For example, in one application, I have a 
(POLICY_SINGLETON) coroutine that works off a download queue.  If 
there's nothing in the queue, it's suspended at a yield.  It's the 
coroutine equivalent of a dedicated thread. [1]

It must be possible to "wake" the queue manager when I enqueue a job for 
it.  Kaa has this notion of "signals" which is similar to the gtk+ style 
of signals in that you can attach callbacks to them and emit them.  
Signals can be represented as InProgress objects, which means they can 
be yielded from coroutines and used in InProgressAny/All objects.

So my download manager coroutine can yield an InProgressAny of all the 
active download coroutines /and/ the "new job enqueued" signal, and 
execution will resume as long as any of those conditions are met.

Is there anything in your current proposal that would allow for this 
use-case?

[1] https://github.com/jtackaberry/stagehand/blob/master/src/manager.py#L390

Another pain point for me has been this notion of unhandled asynchronous 
exceptions.  Asynchronous tasks are represented as an InProgress object, 
and if a task fails, accessing InProgress.result will raise the 
exception at which point it's considered handled. This attribute access 
could happen at any time during the lifetime of the InProgress object, 
outside the task's call stack.

The desirable behaviour is that when the InProgress object is destroyed, 
if there's an exception attached to it from a failed task that hasn't 
been accessed, we should output the stack as an unhandled exception.  In 
Kaa, I do this with a weakref destroy callback, but this isn't ideal 
because with GC, the InProgress might not be destroyed until well after 
the exception is relevant.

I make every effort to remove reference cycles and generally get the 
InProgress object destroyed as early as possible, but this changes 
subtly between Python versions.

How will unhandled asynchronous exceptions be handled with tulip?

Thanks!
Jason.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121216/53d2fc32/attachment.html>