[Web-SIG] WSGI 2.0

Sat Oct 6 00:07:35 CEST 2007

At 10:13 PM 10/5/2007 +0100, Robin Bryce wrote:
>That's to much chicken/egg for my tastes. All you are really saying is
>that the CGI model covers the majority of 'common' use cases. I don't
>know of anyone who would disagree with this. But as things stand all
>wsgi-ish implementations which aim to support async/sync are consigned
>to the dust bin of 'non conformant'. This acts as a strong
>disincentive to experiment and innovate.
>
>If, for clear technical reasons, nothing can be done so support mixing
>async aware and synchronous applications in WSGI 2.0, then so it goes.
>
>If it can't be done without imposing significant complexity on
>applications that are perfectly happy with the highly successful wsgi
>1.0 model, then fair enough - WSGI-A is a non starter.
>
>Or are you against introducing features to support async servers and
>composition of mixed async/sync stacks on principle ?

Not in *principle*, only in practice.  :)  If you read the archives 
of a few years back, I was rather enthusiastic until I realized that 
there really wasn't any way to make it of practical benefit.

See, in order for a server to take advantage of an application's 
"asynchronous" nature, the server has to *know* the application won't 
"block".  That is, the app has to *promise* not to block.  (Because 
without this promise, the server is forced to run the app in a 
separate thread or process, so as not to block the server.)

But in order for the app to make this promise, it can only use 
components that either make the same promise, unless it runs *them* 
in other threads or processes...  which means giving up on easily 
composing applications from multiple WSGI components.

So far, discussion on this matter has hinged on the claim that it's 
*possible* to make such mixed stacks, and I don't disagree.  What 
nobody has shown is that it's 1. practical, and 2. produces some 
actual benefit, compared to the synchronous model now in use.  As a 
practical matter, the vast majority of Python web applications and 
frameworks are synchronous by nature, and those that aren't are 
already tied to a specific async API.

If we were going to try to implement an asynchronous WSGI, what we 
would *really* need to do is discard the app_iter and make write() 
the standard way of sending the body.  This would let us implement a 
CPS (continuation-passing style) API.  We would also have to change 
the input stream so that instead of reading from it, we instead 
passed it functions to be called when input was available, and so 
on.  We would also need a way to tell write() that we were finished 
writing, and some way to manage connection timeouts.

Unfortunately, this programming style is verbose and more difficult 
to learn for people versed in less "twisted" ways of programming.  To 
write middleware in this style, you also need to write deeply nested 
functions.  And synchronous servers would need to figure out what to 
do when an application returns without having called start_response() 
yet or figured out how to close the stream.

Anyway, my point here is that I see how we could either cater to 
synchronous apps or async apps in a given API.  But throwing a 
half-baked async API on top of a synchronous one is just making a 
mess and helping no-one.

To sketch a WSGI-A application:

     def app(environ, start_response)
         start_response('200 Cool', [('content-type','text/plain')])
         write('Hello world!')
         write(None)  # close

And a WSGI 1->WSGI A converter:

     class ReadCallbackWrapper:
         def __init__(self, stream):
             self.stream = stream
         def on_read(self, size, callback):
             callback(self.stream.read(size))

     def wsgi_1_app(environ, start_response):
         running = [1]
         def sr(*args):
             write = sr(*args)
             def w(arg):
                 if running:
                     if arg is None:
                         running.pop()
                     else:
                         write(arg)
                 else:
                     raise RuntimeError("Already closed!")
             return w
         environ['wsgi.input'] = ReadCallbackWrapper(environ['wsgi.input'])
         wsgi_a_app(environ, sr)
         while running:
             pass   # really should have a timeout check here
         return []

This highlights the essential difference between a sync and async 
API: the sync API either finishes right away or returns something the 
server calls until it's exhausted.  An async API offers no guarantee 
that anything has been done when the app is called.  Anything could 
happen at any time later.

My gut feel is that it's harder to write middleware for WSGI-A style 
of API, because you have to do at least doubly nested functions if 
you're dealing with the output at all (as this example shows).

And if we mix modes, then we have this sort of messy back-and-forth 
adaptation in between.  And as best I can tell, the proposal for a 
mixed-mode API that you gave would actually make it even *harder* 
than this to write WSGI middleware, as there would be similar 
boundary issues for the input stream.