[Async-sig] async/sync library reusage

Fri Jun 9 05:06:39 EDT 2017

> On 9 Jun 2017, at 06:48, Nathaniel Smith <njs at pobox.com> wrote:
> 
> I would say that this is something that we as a community are still
> figuring out. I really like the Sans-IO approach, and it's a really
> valuable piece of the solution, but it doesn't solve the whole problem
> by itself - you still need to actually do I/O, and this means things
> like error handling and timeouts that aren't obviously a natural fit
> to the Sans-IO approach, and this means you may still have some tricky
> code that can end up duplicated. (Or maybe the Sans-IO approach can be
> extended to handle these things too?) There are active discussions
> happening in projects like urllib3 [1] and packaging [2] about what
> the best strategy to take is. And the options vary a lot depending on
> whether you need to support python 2 etc.

Let me take a moment to elaborate on some of the thinking that has gone on for urllib3/Requests. We have an unusual set of constraints that are worth understanding, and so I’ll throw out all the ideas we had and why they were rejected (and indeed, why you may not want to reject them).

1. Implement the core library in asyncio, add a synchronous shim on top of it in terms of asyncio.run_until_complete().

This works great in many ways: you get a nice async-based library implementation, you correctly prioritise people using the async case over those using the synchronous one, and you can expect wide support and interop thanks to asyncio’s role as the common event loop implementation. However, you don’t support more novel async paradigms like those used by curio and trio.

More damningly for urllib3/Requests, this also limits your supported Python versions to 3.5 and later. There are also some efficiency concerns. Finally, unless you’re willing to only support 3.7 you end up needing to pass loop arguments around which is pretty gross.

2. Have an abstract low-level I/O interface and “bleach” it (remove the keywords async/await) on Python 2.

This would require you write all your code in terms of a small number of abstract I/O operations with “async” in front of their name, e.g. “async def send”, “async def recv”, and so-on. You can then implement these across multiple I/O backends, and also provide a synchronous one that still has “async” in front of it and just doesn’t ever use the word “await”. You can then provide a code transformation at install time on Python 2 that transforms that codebase, removing all the words “async” and “await” and leaving behind a synchronous-only codebase.

The advantages here are better support for novel async paradigms (e.g. curio and trio), the ability to write more native backends for non-asyncio I/O models (e.g. Twisted/Tornado), and having a single codebase that handles sync and async.

There are many myriad disadvantages. The first is the most obvious: the code your users run is not the same as the code you shipped. While the transformation is small and pretty easy to understand, that doesn’t remove its risks. It also makes debugging harder and more painful. On top of that, your Python 3 synchronous code looks pretty ugly because you have to write the word “await” around it even though it is not in fact asynchronous (technically you *don’t* have to do that but I guarantee IDEs will get mad).

More subtly, this causes problems for backpressure and task management on event loops. It turns out defining your low-level I/O primitives is not trivial. In urllib3’s case, one of the things we’d need is either the equivalent of ‘async def select()’ or ‘async def new_task’. In the first case, to write this would require a careful management of futures/deferreds and various bits of state in order to correctly suspect execution on event loops. In the second case, the synchronous version of this is called “threading.Thread” and that has a number of issues. I’d say that if you’re going to use threads you may as well just always use threads, but more importantly it has substantially different semantics to all async task management which make it difficult to reason about and to ensure that the code is sensible.

This approach is also entirely untested, at any scale. It’s simply not clear that it works yet. All the tooling would need to be written.

3. Just use Twisted/Tornado.

This variation on number (1) turns out to get you surprisingly close to our actual goal. Twisted and Tornado support Python 2 and Python 3, when async/await are present they integrate fairly nicely with them, and they give you the added advantage of allowing your Python 2 users to do asynchronous code so long as they buy into the relevant async ecosystem. It also means that you can use the run_until_complete model for your Python 2 synchronous code.

However, these also have some downsides. Twisted, the library I know better, doesn’t yet integrate as cleanly with async/await as we’d like: that’s coming sometime this year, probably with the landing of 3.7. Additionally, Twisted has no equivalent of asyncio.run_until_complete(), which would mean that someone would have to add the relevant Twisted support (either restartable or instantiable reactors, neither of which Twisted has yet).

This also adds a potentially sizeable external dependency, which isn’t necessarily all that fun.

4. ??? Who knows.

Right now there is no clarity about what we’re going to do. It’s possible that the answer will end up being “nothing at the moment’ and that we’ll wait for the ecosystem to progress for a while before making the change. Either way, it’s clear that there is no easy answer to this problem.

Cory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20170609/74eb1716/attachment.html>