[Python-Dev] Tricky way of of creating a generator via a comprehension expression

Sun Nov 26 15:29:30 EST 2017

On Sat, Nov 25, 2017 at 3:37 PM, Guido van Rossum <guido at python.org> wrote:
> On Sat, Nov 25, 2017 at 1:05 PM, David Mertz <mertz at gnosis.cx> wrote:
>>
>> FWIW, on a side point. I use 'yield' and 'yield from' ALL THE TIME in real
>> code. Probably 80% of those would be fine with yield statements, but a
>> significant fraction use `gen.send()`.
>>
>> On the other hand, I have yet once to use 'await', or 'async' outside of
>> pedagogical contexts. There are a whole lot of generators, including ones
>> utilizing state injection, that are useful without the scaffolding of an
>> event loop, in synchronous code.
>
>
> Maybe you didn't realize async/await don't need an event loop? Driving an
> async/await-based coroutine is just as simple as driving a yield-from-based
> one (`await` does exactly the same thing as `yield from`).

Technically anything you can write with yield/yield from could also be
written using async/await and vice-versa, but I think it's actually
nice to have both in the language.

The distinction I'd make is that yield/yield from is what you should
use for ad hoc coroutines where the person writing the code that has
'yield from's in it is expected to understand the details of the
coroutine runner, while async/await is what you should use when the
coroutine running is handled by a library like asyncio, and the person
writing code with 'await's in it is expected to treat coroutine stuff
as an opaque implementation detail. (NB I'm using "coroutine" in the
CS sense here, where generators and async functions are both
"coroutines".)

I think of this as being sort of half-way between a style guideline
and a technical guideline. It's like the guideline that lists should
be homogenously-typed and variable length, while tuples are
heterogenously-typed and fixed length: there's nothing in the language
that outright *enforces* this, but it's a helpful convention *and*
things tend to work better if you go along with it.

Here are some technical issues you'll run into if you try to use
async/await for ad hoc coroutines:

- If you don't iterate an async function, you get a "coroutine never
awaited" warning. This may or may not be what you want.

- async/await has associated thread-global state like
sys.set_coroutine_wrapper and sys.set_asyncgen_hooks. Generally async
libraries assume that they own these, and arbitrarily weird things may
happen if you have multiple async/await coroutine runners in same
thread with no coordination between them.

- In async/await, it's not obvious how to write leaf functions:
'await' is equivalent to 'yield from', but there's no equivalent to
'yield'. You have to jump through some hoops by writing a class with a
custom __await__ method or using @types.coroutine. Of course it's
doable, and it's no big deal if you're writing a proper async library,
but it's awkward for quick ad hoc usage.

For a concrete example of 'ad hoc coroutines' where I think 'yield
from' is appropriate, here's wsproto's old 'yield from'-based
incremental websocket protocol parser:

    https://github.com/python-hyper/wsproto/blob/4b7db502cc0568ab2354798552148dadd563a4e3/wsproto/frame_protocol.py#L142

The flow here is: received_frames is the public API: it gives you an
iterator over all completed frames. When it stops you're expected to
add more data to the buffer and then call it again. Internally,
received_frames acts as a coroutine runner for parse_more_gen, which
is the main parser that calls various helper methods to parse
different parts of the websocket frame. These calls eventually bottom
out in _consume_exactly or _consume_at_most, which use 'yield' to
"block" until enough data is available in the internal buffer.
Basically this is the classic trick of using coroutines to write an
incremental state machine parser as ordinary-looking code where the
state is encoded in local variables on the stack.

Using coroutines here isn't just a cute trick; I'm pretty confident
that there is absolutely no other way to write a readable incremental
websocket parser in Python. This is the 3rd rewrite of wsproto's
parser, and I think I've read the code for all the other Python
libraries that do this too. The websocket framing format is branchy
enough that trying to write out the state machine explicitly will
absolutely tie you in knots. (Of course we then rewrote wsproto's
parser a 4th time for py2 compatibility; the current version's not
*terrible* but the 'yield from' version was simpler and more
maintainable.)

For wsproto's use case, I think using 'await' would be noticeably
worse than 'yield from'. It'd make the code more opaque to readers
(people know generators but no-one shows up already knowing what
@types.coroutine does), the "coroutine never awaited" warnings would
be obnoxious (it's totally fine to instantiate a parser and then throw
it away without using it!), and the global state issues would make us
very nervous (wsproto is absolutely designed to be used alongside a
library like asyncio or trio). But that's fine; 'yield from' exists
and is perfect for this application.

Basically this is a very long way of saying that actually the status
quo is pretty good, at least with regard to yield from vs. async/await
:-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org