[Web-SIG] Server-side async API implementation sketches

Alice Bevan–McGregor alice at gothcandy.com
Sun Jan 9 12:36:14 CET 2011


On 2011-01-08 19:34:41 -0800, P.J. Eby said:

> At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote:
>> 09.01.2011 04:15, Alice Bevan­McGregor kirjoitti:
>>> I hope that clearly identifies my idea on the subject. Since 
>>> async>>servers will /already/ be implementing their own executors, I 
>>> don't>>see this as too crazy.
>> -1 on this. Those executors are meant for executing code in a 
>> thread>pool. Mandating a magical socket operation filter here 
>> would>considerably complicate server implementation.
> 
> Actually, the *reverse* is true.  If you do it the way Alice proposes, 
> my sketches don't get any more complex, because the filtering goes in 
> the executor facade or submit function.

Indeed; the executor is what then adds the file descriptor to the 
underlying server async reactor (select/epoll/kqueue/other).  In the 
case of the Marrow server, this would utilize a reactor callback (some 
might say "deferred") to update the Future instance with the data, 
setting completion status, executing callbacks, etc.  One might even be 
able to use a threading.Event (or whatever is the opposite of a lock) 
to wake up blocking .result() calls, even if not multi-threaded 
(greenthreads, etc.).

Of course, adding the file descriptor to a pure async reactor then 
.result() blocking on it from your application would result in a 
deadlock; the .result() would never complete as the reactor would never 
get a chance to perform the pending request.  (This is why Marrow 
requires threading be enabled globally before adding an executor to the 
environment; this requires rather explicit documentation.)  This 
problem is solved completely by yielding the future instance (pausing 
the application) to let the reactor do its thing.  (Yielding the future 
becomes a replacement for the blocking behaviour of future.result().)

Effectively what I propose adds emulation of threading on top of async 
by mutating an Executor.  (The Executor would be a mixed 
threading+async executor.)

I suggest bubbling a future back up the yield stack instead of the 
actual result to allow the application (or middleware, or whatever 
happened to yield the future) to capture exceptions generated by the 
future'd request.  Bubbling the future instance avoids excessive 
exception handling cruft in each middleware layer; and I see no real 
issue with this.  AFIK, you can use a shorthand (possibly wrapped in a 
try: block) if all you care about is the result:

    data = (yield my_future).result()

> Truthfully, I don't really see the point of exposing the map() method 
> (which is the only other executor method we'd expose), so it probably 
> makes more sense to just offer a 'wsgi.submit' key... which can be a 
> function as follows: [snip]

True; the executor itself could easily be hidden behind the filter.  In 
a multi-threaded environment, however, the map call poses no problem, 
and can be quite useful.  (E.g. with one of my use cases for inclusion 
of an executor in the environment: image scaling.)

> Granted, this might be a rather long function.  However, since it's 
> essentially an optimization, a given server can decide how many 
> functions can be shortcut in this way.  The spec may wish to offer a 
> guarantee or recommendation for specific methods of certain 
> stdlib-provided types (sockets in particular) and wsgi.input.

+1

> Personally, I do think it might be *better* to offer extended 
> operations on wsgi.input that could be used via yield, e.g. "yield 
> input.nb_read()".  But of course then the trampoline code has 
> torecognize those values instead of futures.

Because wsgi.input is provided by the server, and the executor is 
provided by the server, is there a reason why these extended functions 
couldn't return... futures?  :)

> Note, too, that this complexity also only affects servers that want to 
> offer a truly async API.  A synchronous server has no reason to pay 
> particular attention to what's in a future, since it can't offer any 
> performance improvement.

I feel a sync server and async server should provide the same API for 
accessing the input.  E.g. the application/middleware must be agnostic 
to the server in this regard.  This is why a little bit of magic goes a 
long way.  The following code would work on any WSGI2 stack that offers 
an executor (sync, async, or provided by middleware):

    data = (yield env['wsgi.submit'](env['wsgi.input'].read, 4096)).result()

In a sync server, the blocking read would execute in another thread.  
In an async one appropriate actions would be taken to request a socket 
read from the client.  Both cases pause the application pending the 
result.  (If you don't immediately yield the future the behaviour 
between servers is the same!)

> I do think that this sort of API discussion, though, is the most 
> dangerous part of trying to do an async spec.  That is, I don'texpect 
> that everyone will spontaneously agree on the exact same API.  Alice's 
> proposal (simply submitting object methods) has theadvantage of 
> severely limiting the scope of API discussions.  ;-)

Since each async server will either implement or utilize a specific 
async framework, each will offer its own "async-supported" featureset.  
What I mean is that all servers should make wsgi.input calls 
async-able, some would go further to make all socket calls async.  Some 
might go even further than that and define an API for external 
libraries (e.g. DBs) to be truly cooperatively async.  I do believe my 
solution is flexible enough for the majority of use cases, and where it 
isn't (i.e. would block) "abusing" futures in this way will allow an 
application to reasonalby fake async without killing async server (who 
are internally single-threaded) performance by delegating blocking 
calls.

I will have to experiment with determining the type of the class 
instance a method is bound to from the bound method itself; this is the 
crux of the implementation I suggest.  If you can't get that, the idea 
is pooched for anything but wsgi.input which the server would have a 
direct reference to anyway.

I hope the clarity of this post didn't degenerate too much over the few 
hours I had it open and noodling around.

	- Alice.




More information about the Web-SIG mailing list