[Async-sig] async/sync library reusage

Fri Jun 9 15:52:36 EDT 2017

...so I really am enjoying the conversation.

Guido - re: "vision too far out":  yes, for people trying to struggle w/
async support in their libraries, now... but that is also part of my
motivation.   Python 5?  Sure...  (I may have to watch it come to use from
the grave, but hopefully not... ;-) ).  Anyway, from back-porting and
tactical "implement now" concerns, to plans for next release, to plans for
next version of python, to brainstorming much less concrete future versions
- all are an interesting continuum.

Re:  GIL... sure, sort of, and sort of not.  I was thinking "as long as
major changes are going on...  think about additional structural
changes..."   More to the point:  as I see it, people have a hard time
thinking about async in the cooperative-multitasking (CMT) sense, and thus
disappointments happen around blocking (missed, or unexpects, e.g. hardware
failures).   Cory (in his reply - and, yeah: nice writeup!) hints to what I
generally structurally like:

"...we’d ideally treat asyncio as the first-class citizen and retrofit on
the threaded support, rather than the other way around"

Structurally,  async is light-weight overhead compared to threads, which
are lightweight compared to processes, and so a sort of natural app flow
seems from lightest-weight, on out.  To me, this seems practical for making
life easier for developers, because you can imagine "promoting" an async
task caught unexpectedly blocking, to a thread, while still having the
lightest-weight loop have control over it (promotion out, as well as
cancellation while promoted).

As for multiple task loops, or loops off in a thread, I haven't thought
about it too much, but this seems like nothing new nor unreasonable.  I'm
thinking of the base-stations we talk over in our mobile connections, which
are multiple diskless servers, and hot-promote to "master" server status on
hardware failure (or live capacity upgrade, i.e. inserting processors).
This pattern seems both reasonable and useful in this context, i.e. the
concept of a master loop (which implies communication/control channels - a
complication).  With some thought, some reasonable ground rules and
simplifications, and I would expect much can be done.

Appreciate the discussions!

- Yarko

On Fri, Jun 9, 2017 at 1:23 PM, Guido van Rossum <guido at python.org> wrote:

> Great write-up! I actually find the async nature of HTTP (both versions) a
> compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly
> like it would make the implementation easier; for HTTP/2 it sounds like it
> would just be better for the user-side as well (if the user just wants one
> resource they can safely continue to use the synchronous HTTP/1.1 version
> of the API.)
>
> On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
>
>>
>> On 9 Jun 2017, at 17:28, Guido van Rossum <guido at python.org> wrote:
>>
>> At least one of us is still confused. The one-event-loop-per-thread model
>> is supported in asyncio without passing the loop around explicitly. The
>> get_event_loop() implementation stores all its state in thread-locals
>> instance, so it returns the thread's event loop. (Because this is an
>> "advanced" model, you have to explicitly create the event loop with
>> new_event_loop() and make it the default loop for the thread with
>> set_event_loop().)
>>
>>
>> Aha, ok, so the confused one is me. I did not know this. =) That
>> definitely works a lot better. It admittedly works less well if someone is
>> doing their own custom event loop stuff, but that’s probably an acceptable
>> limitation up until the time that Python 2 goes quietly into the night.
>>
>> All in all, I'm a bit curious why you would need to use asyncio at all
>> when you've got a thread per request anyway.
>>
>>
>> Yeah, so this is a bit of a diversion from the original topic of this
>> thread but I think it’s an idea worth discussing in this space. I want to
>> reframe the question a bit if you don’t mind, so shout if you think I’m not
>> responding to quite what you were asking. In my understanding, the question
>> you’re implicitly asking is this:
>>
>> "If you have a thread-safe library today (that is, one that allows users
>> to do threaded I/O with appropriate resource pooling and management), why
>> move to a model built on asyncio?”
>>
>> There are many answers to this question that differ for different
>> libraries with different uses, but for HTTP libraries like urllib3 here are
>> our reasons.
>>
>> The first is that it turns out that even for HTTP/1.1 you need to write
>> something that amounts to a partial event loop to properly handle the
>> protocol. Good HTTP clients need to watch for responses while they’re
>> uploading body data because if a response arrives during that process body
>> upload should be terminated immediately. This is also required for sensibly
>> handling things like Expect: 100-continue, as well as spotting other
>> intermediate responses and connection teardowns sensibly and without
>> throwing exceptions.
>>
>> Today urllib3 does not do this, and it has caused us pain, so our v2
>> branch includes a backport of the Python 3 selectors module and a
>> hand-written partially-complete event loop that only handles the specific
>> cases we need. This is an extra thing for us to debug and maintain, and
>> ultimately it’d be easier to just delegate the whole thing to event loops
>> written by others who promise to maintain them and make them efficient.
>>
>> The second answer is that I believe good asyncio support in libraries is
>> a vital part of the future of this language, and “good” asyncio support IMO
>> does as little as possible to block the main event loop. Running all of the
>> complex protocol parsing and state manipulation of the Requests stack on a
>> background thread is not cheap, and involves a lot of GIL swapping around.
>> We have found several bug reports complaining about using Requests with
>> largish-numbers of threads, indicating that our big stack of Python code
>> really does cause contention on the GIL if used heavily. In general, having
>> to defer to a thread to run *Python* code in asyncio is IMO a nasty
>> anti-pattern that should be avoided where possible. It is much less bad to
>> defer to a thread to then block on a syscall (e.g. to get an “async”
>> getaddrinfo), but doing so to run a big big stack of Python code is vastly
>> less pleasant for the main event loop.
>>
>> For this reason, we’d ideally treat asyncio as the first-class citizen
>> and retrofit on the threaded support, rather than the other way around.
>> This goes doubly so when you consider the other reasons for wanting to use
>> asyncio.
>>
>> The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is
>> a *highly* concurrent protocol. Connections send a lot of control frames
>> back and forth that are invisible to the user working at the semantic HTTP
>> level but that nonetheless need relatively low-latency turnaround (e.g.
>> PING frames). It turns out that in the traditional synchronous HTTP model
>> urllib3 only gets access to the socket to do work when the user calls into
>> our code. If the user goes a “long” time without calling into urllib3, we
>> take a long time to process any data off the connection. In the best case
>> this causes latency spikes as we process all the data that queued up in the
>> socket. In the worst case, this causes us to lose connections we should
>> have been able to keep because we failed to respond to a PING frame in a
>> timely manner.
>>
>> My experience is that purely synchronous libraries handling HTTP/2 simply
>> cannot provide a positive user experience. HTTP/2 flat-out *requires*
>> either an event loop or a dedicated background thread, and in practice in
>> your dedicated background thread you’d also just end up writing an event
>> loop (see answer 1 again). For this reason, it is basically mandatory for
>> HTTP/2 support in Python to either use an event loop or to spawn out a
>> dedicated C thread that does not hold the GIL to do the I/O (as this thread
>> will be regularly woken up to handle I/O events).
>>
>> Hopefully this (admittedly horrifyingly long) response helps illuminate
>> why we’re interested in asyncio support. It should be noted that if we find
>> ourselves unable to get it in the short term we may simply resort to
>> offering an “async” API that involves us doing the rough equivalent of
>> running in a thread-pool executor, but I won’t be thrilled about it. ;)
>>
>> Cory
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20170609/4ef26b1a/attachment-0001.html>