[Async-sig] APIs for backpressure (was: Re: Some thoughts on asynchronous API design in a post-async/await world)

Glyph Lefkowitz glyph at twistedmatrix.com
Wed Nov 9 21:43:46 EST 2016


> On Nov 9, 2016, at 7:59 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
> 
> 
>> On 8 Nov 2016, at 21:07, Glyph Lefkowitz <glyph at twistedmatrix.com <mailto:glyph at twistedmatrix.com>> wrote:
>> 
>> 
>>> On Nov 8, 2016, at 2:17 AM, Cory Benfield <cory at lukasa.co.uk <mailto:cory at lukasa.co.uk>> wrote:
>>> 
>>> Tubes is unquestionably a better API for this, but suffers from a lack of accessibility.
>> 
>> What is "accessibility" in this context?
> 
> “accessibility” in this context is essentially the collection of things that make it easy for users to a) identify a need for tubes, b) work out how to plug tubes into their application, and c) have a sensible evolution to handle evolving backpressure needs into tubes.

I see what you mean.  I've struggled with this problem a lot myself.

> Mostly this is a documentation thing, but there’s also a chicken-and-egg problem here, specifically: tubes provides a high-level API for flow control but requires that pre-existing code use a low-level one. How do we get from there to somewhere we can actually tell people “yeah, go use tubes”?

Aah, yes.  This is exactly where the project is stuck.  A large amount of infrastructure has to be retrofitted to be Tubes all the way down before you can make them useful.

And the problem isn't just with Tubes at the infrastructure level; most applications are fundamentally not stream processing, and it will be idiomatically challenging to express them as such.  HTTP connections are short-lived and interfaces to comprehending them (i.e. 'json.loads') are not themselves stream oriented.  Even if you did have a stream-oriented JSON parser, expressing what you want from it is hard; you want a data structure that you can simultaneously inspect multiple elements from, not a stream of "object began" / "string began" / "string ended" / "list began" / "list ended" events.

> On top of that we have: how do we justify using tubes when so much of for example Twisted’s codebase does not implement IPushProducer/IConsumer?

I think this may be slightly misleading.  You're making it sound like there is a proliferation of transports or stream interfaces that don't provide these interfaces, but should.  At a low level, almost everything in the Twisted codebase which is actually a _stream_ of data (rather than a request/response) does implement these interfaces, or has a public `transport` attribute which does so.  For example, the HTTP response object that comes back from Agent does have a transport.

The issue is not that the interfaces aren't provided on specific objects where they should be, but that the entire shape of the arbitrarily-large-request/arbitrarily-large-response pattern expects to be able to store whole responses.  By the time you get to the layer which "doesn't implement" IPushProducer/IConsumer, you're at a level where such an implementation would be meaningless, or unhelpful.

> How do people migrate a pre-existing codebase to something like tubes? How do people extend tubes to do something other than *propagate* backpressure (e.g. to implement a fast-fail path to error out rather than stop reading from a socket). All of these questions *have* answers, but those answers aren’t easily accessible.

I think one way to start to drill into this would be (sorry for the over-specificty to Twisted here) to address something like <https://twistedmatrix.com/trac/ticket/288> with Tubes.  There are at least a few clear-cut cases where we _do_ have a large stream of data which needs to be directed to an appropriate location, and the sooner we can make the standard interface for that into "return a Fount", the sooner we can start to make Tubes just as much of the idiomatic lexicon of async I/O as awaitables, Futures or Deferreds.  Pulling a dependency like this into asyncio would obviously be challenging, but 3rd-party packages like Twisted - or, for that matter, aiohttp! - could start to depend on tubes as-is.

> Part of this is an ongoing cultural problem which is that people who build small or non-distributed applications often don’t have to think about backpressure, so there’s another problem that also needs addressing: it needs to be so easy for people to extend their async producers and consumers of data to propagate and respond to backpressure appropriately that there’s no good reason *not* to do it.

The extension of this cultural problem is that most of the tools used in large distributed systems are built as hobby projects for small, non-distributed systems, so even at scale and in these environments we still find ourselves fighting with layers that don't want to deal with backpressure properly.

More importantly, backpressure at scale in distributed systems often means really weird stuff, like, traffic shaping on a front-end tier by coordinating with a data store or back-end tier to identify problem networks or network ranges.  Tubes operates at a simpler level: connections are individual entities, and backpressure is applied uniformly across all of them.  Granted, this is the basic layer you need in place to make addressing backpressure throughout a system work properly, but it's also not an exciting product that solves a super hard or complex problem.

> All of this complex mess of things is what I mean by “accessibility”. It needs to be easier to do the right thing than the wrong thing.

I'm definitely open to more ideas on this topic.  Retrofitting backpressure into existing systems is hard, and harder still when you're trying to expose an idiomatic, high-level API.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20161109/e14629f4/attachment.html>


More information about the Async-sig mailing list