[Python-Dev] Tricky way of of creating a generator via a comprehension expression

Guido van Rossum guido at python.org
Sun Nov 26 18:51:16 EST 2017


On Sun, Nov 26, 2017 at 12:29 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sat, Nov 25, 2017 at 3:37 PM, Guido van Rossum <guido at python.org>
> wrote:
> > On Sat, Nov 25, 2017 at 1:05 PM, David Mertz <mertz at gnosis.cx> wrote:
> >>
> >> FWIW, on a side point. I use 'yield' and 'yield from' ALL THE TIME in
> real
> >> code. Probably 80% of those would be fine with yield statements, but a
> >> significant fraction use `gen.send()`.
> >>
> >> On the other hand, I have yet once to use 'await', or 'async' outside of
> >> pedagogical contexts. There are a whole lot of generators, including
> ones
> >> utilizing state injection, that are useful without the scaffolding of an
> >> event loop, in synchronous code.
> >
> >
> > Maybe you didn't realize async/await don't need an event loop? Driving an
> > async/await-based coroutine is just as simple as driving a
> yield-from-based
> > one (`await` does exactly the same thing as `yield from`).
>
> Technically anything you can write with yield/yield from could also be
> written using async/await and vice-versa, but I think it's actually
> nice to have both in the language.
>

Perhaps. You seem somewhat biased towards the devil you know, but you also
bring up some good points.


> The distinction I'd make is that yield/yield from is what you should
> use for ad hoc coroutines where the person writing the code that has
> 'yield from's in it is expected to understand the details of the
> coroutine runner, while async/await is what you should use when the
> coroutine running is handled by a library like asyncio, and the person
> writing code with 'await's in it is expected to treat coroutine stuff
> as an opaque implementation detail. (NB I'm using "coroutine" in the
> CS sense here, where generators and async functions are both
> "coroutines".)
>
> I think of this as being sort of half-way between a style guideline
> and a technical guideline. It's like the guideline that lists should
> be homogenously-typed and variable length, while tuples are
> heterogenously-typed and fixed length: there's nothing in the language
> that outright *enforces* this, but it's a helpful convention *and*
> things tend to work better if you go along with it.
>

Hm. That would disappoint me. We carefully tried to design async/await to
*not* require an event loop. (I'll get to the global state below.)


> Here are some technical issues you'll run into if you try to use
> async/await for ad hoc coroutines:
>
> - If you don't iterate an async function, you get a "coroutine never
> awaited" warning. This may or may not be what you want.
>

It should indicate a bug, and the equivalent bug is silent when you're
using yield-from, so I see this as a positive. If you find yourself
designing an API where abandoning an async function is a valid action you
should probably think twice.


> - async/await has associated thread-global state like
> sys.set_coroutine_wrapper and sys.set_asyncgen_hooks. Generally async
> libraries assume that they own these, and arbitrarily weird things may
> happen if you have multiple async/await coroutine runners in same
> thread with no coordination between them.
>

The existence of these is indeed a bit unfortunate for this use case. I'm
CC'ing Yury to ask him if he can think of a different way to deal with the
problems that these are supposed to solve. For each, the reason they exist
is itself an edge case -- debugging for the former, finalization for the
latter. A better solution for these problem may also be important for
situations where multiple event loops exist (in the same thread, e.g.
running alternately). Maybe a context manager could be used to manage this
state better?


> - In async/await, it's not obvious how to write leaf functions:
> 'await' is equivalent to 'yield from', but there's no equivalent to
> 'yield'. You have to jump through some hoops by writing a class with a
> custom __await__ method or using @types.coroutine. Of course it's
> doable, and it's no big deal if you're writing a proper async library,
> but it's awkward for quick ad hoc usage.
>

Ah, yes, you need the equivalent of a Future. Maybe we should have a simple
one in the stdlib that's not tied to asyncio.


> For a concrete example of 'ad hoc coroutines' where I think 'yield
> from' is appropriate, here's wsproto's old 'yield from'-based
> incremental websocket protocol parser:
>
>     https://github.com/python-hyper/wsproto/blob/
> 4b7db502cc0568ab2354798552148dadd563a4e3/wsproto/frame_protocol.py#L142
>

Ah yes, these kinds of parsers are interesting use cases for coroutines.
There are many more potential use cases than protocol parsers -- e.g. the
Python REPL could really use one if you want to replicate it completely in
pure Python.


> The flow here is: received_frames is the public API: it gives you an
> iterator over all completed frames. When it stops you're expected to
> add more data to the buffer and then call it again. Internally,
> received_frames acts as a coroutine runner for parse_more_gen, which
> is the main parser that calls various helper methods to parse
> different parts of the websocket frame. These calls eventually bottom
> out in _consume_exactly or _consume_at_most, which use 'yield' to
> "block" until enough data is available in the internal buffer.
> Basically this is the classic trick of using coroutines to write an
> incremental state machine parser as ordinary-looking code where the
> state is encoded in local variables on the stack.
>
> Using coroutines here isn't just a cute trick; I'm pretty confident
> that there is absolutely no other way to write a readable incremental
> websocket parser in Python. This is the 3rd rewrite of wsproto's
> parser, and I think I've read the code for all the other Python
> libraries that do this too. The websocket framing format is branchy
> enough that trying to write out the state machine explicitly will
> absolutely tie you in knots. (Of course we then rewrote wsproto's
> parser a 4th time for py2 compatibility; the current version's not
> *terrible* but the 'yield from' version was simpler and more
> maintainable.)
>

No argument here.


> For wsproto's use case, I think using 'await' would be noticeably
> worse than 'yield from'. It'd make the code more opaque to readers
> (people know generators but no-one shows up already knowing what
> @types.coroutine does),


That's "the devil you know" though. I expect few people have a solid
understanding of how a bare "yield" works when there's also a "yield from".


> the "coroutine never awaited" warnings would
> be obnoxious (it's totally fine to instantiate a parser and then throw
> it away without using it!), and the global state issues would make us
> very nervous (wsproto is absolutely designed to be used alongside a
> library like asyncio or trio). But that's fine; 'yield from' exists
> and is perfect for this application.
>
> Basically this is a very long way of saying that actually the status
> quo is pretty good, at least with regard to yield from vs. async/await
> :-).
>

Yeah. I'm buying maybe 75% of it. But async/await is still young...

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171126/8a0b0657/attachment-0001.html>


More information about the Python-Dev mailing list