[Python-ideas] The async API of the future: yield-from

Sat Oct 13 07:05:53 CEST 2012

Guido van Rossum wrote:

> But the fact remains that you can't completely hide these yields --
> the best you can do is replace them with a single yield-from.

Yes, as things stand, a call to a sub-generator is always
going to look different from an ordinary call, all the way
up the call chain. I regard that as a wart remaining to be
fixed, although opinions seem to differ.

I do think it's a bit unfortunate that 'yield from' contains
the word 'yield', though, since in this context it's best
thought of as a kind of function call rather than a kind
of yield.

>>>This seems to be begging to be collapsed into a single line, e.g.
>>>
>>>      data = yield sock.recv_async(1024)
> 
> 
>>I'm not sure how you're imagining that would work, but whatever
>>it is, it's wrong -- that just doesn't make sense.
> 
> It makes a lot of sense in a world using
> Futures and a Future-aware trampoline/scheduler, instead of yield-from
> and bare generators. I can see however that you don't like it in the
> yield-from world you're envisioning

I don't like it because, to my mind, Futures et al are kludgy
workarounds for not having something like yield-from. Now that
we do, we shouldn't need them any more.

I can see the desirability of being able to interoperate with
existing code that uses them, but I'm not convinced that building
awareness of them into the guts of the scheduler is the best
way to go about it.

Why Futures in particular? What if someone wants to use Deferreds
instead, or some other similar thing? At some point you need
to build adapters. I'd rather see Futures treated on an equal
footing with the others, and dealt with by building on the
primitive facilities provided by the scheduler.

> But the only use for send() on a generator is when using it as a
> coroutine for a concurrent tasks system... And you're claiming, it seems,
> that you prefer yield-from for concurrent tasks.

The particular technique of using send() to supply a return
value for a simulated sub-generator call is made obsolete
by yield-from.

I can't rule out the possibility that there may be other
uses for send() in a concurrent task system. I just haven't
found the need for it in any of the examples I've developed
so far.

> I feel that "value = yield <something that returns a
> Future>" is quite a good paradigm,

I feel that it shouldn't be *necessary* to yield any kind
of special object in order to suspend a task; just a simple
'yield' should be sufficient.

It might make sense to allow this as an *option* for the
purpose of interoperating with existing async code. But
I would much rather the public API for this was something
like

    value = yield from wait_for_future(a_future)

leaving it up to the implementation whether this is achieved
by yielding the Future or by some other means. Then we can
also have wait_for_deferred(), etc., without giving any one
of them special status.

 > One is what to do with operations directly
> implemented in C. It would be horrible to require C to create a fake
> generator. Fortunately an
> iterator whose final __next__() raises StopIteration(<value>) works in
> the latest Python 3.3

Well, such an iterator *is* a "fake generator" in all the
respects that the scheduler cares about. Especially if the
scheduler doesn't rely on send(), so your C object doesn't
have to implement a send() method. :-)

> Well, I'm talking about a decorator that you *always* apply, and which
> does nothing (or very little) when wrapping a generator, but adds
> generator behavior when wrapping a non-generator function.

As long as it's optional, I wouldn't object to the existence
of such a decorator, although I would probably choose not to
use it most of the time.

I would object if it was *required* to make things work
properly, because I would worry that this was a symptom of
unnecessary complication and inefficiency in the underlying
machinery.

> (6) Spawning off multiple async subtasks
> 
> Futures:
>   f1 = subtask1(args1)  # Note: no yield!!!
>   f2 = subtask2(args2)
>   res1, res2 = yield f1, f2
> 
> Yield-from:
>   ??????????
> 
> *** Greg, can you come up with a good idiom to spell concurrency at
> this level? Your example only has concurrency in the philosophers
> example, but it appears to interact directly with the scheduler, and
> the philosophers don't return values. ***

I don't regard the need to interact directly with the scheduler
as a problem. That's because in the world I envisage, there would
only be *one* scheduler, for much the same reason that there can
really only be one async event handling loop in any given program.
It would be part of the standard library and have a well-known
API that everyone uses.

If you don't want things to be that way, then maybe this is a
good use for yielding things to the scheduler. Yielding a generator
could mean "spawn this as a concurrent task".

You could go further and say that yielding a tuple of generators
means to spawn them all concurrently, wait for them all to
complete and send back a tuple of the results. The yield-from
code would then look pretty much the same as the futures code.

However, I'm inclined to think that this is too much functionality
to build directly into the scheduler, and that it would be better
provided by a class or function that builds on more primitive
facilities. So it would look something like

Yield-from:
    task1 = subtask1(args1)
    task2 = subtask2(args2)
    res1, res2 = yield from par(task1, task2)

where the implementation of par() is left as an exercise for
the reader.

> (7) Checking whether an operation is already complete
> 
> Futures:
>   if f.done(): ...

I'm inclined to think that this is not something the
scheduler needs to be directly concerned with. If it's
important for one task to know when another task is completed,
it's up to those tasks to agree on a way of communicating
that information between them.

Although... is there a way to non-destructively test whether
a generator is exhausted? If so, this could easily be provided
as a scheduler primitive.

> (8) Getting the result of an operation multiple times
> 
> Futures:
> 
>   f = async_op(args)
>   # squirrel away a reference to f somewhere else
>   r = yield f
>   # ... later, elsewhere
>   r = f.result()

Is this really a big deal? What's wrong with having to store
the return value away somewhere if you want to use it
multiple times?

> (9) Canceling an operation
> 
> Futures:
>   f.cancel()

This would be another scheduler primitive.

Yield-from:
    cancel(task)

This would remove the task from the ready list or whatever
queue it's blocked on, and probably throw an exception into
it to give it a chance to clean up.

> (10) Registering additional callbacks
> 
> Futures:
>   f.add_done_callback(callback)

Another candidate for a higher-level facility, I think.
The API might look something like

Yield-from:
    cbt = task_with_callbacks(task)
    cbt.add_callback(callback)
    yield from cbt.run()

I may have a go at coming up with implementations for some of
these things and send them in later posts.

-- 
Greg