[Web-SIG] WSGI 2.0

Fri Mar 30 06:30:58 CEST 2007

Phillip J. Eby wrote:
> At 07:56 PM 3/29/2007 -0500, Ian Bicking wrote:
>> Do we want to discuss WSGI 2.0?  I added a wiki page here to list
>> anything anyone wants to discuss for 2.0: http://wsgi.org/wsgi/WSGI_2.0
>>
>> I've listed the things I can remember, and copying here:
>>
>>
>> start_response and write
>> ------------------------
>>
>> We could remove ``start_response`` and the writer that it implies.  This
>> would lead to a signature like::
>>
>>      def app(environ):
>>          return '200 OK', [('Content-type', 'text/plain')], ['Hello 
>> world']
>>
>> That is, return a three-tuple of (status, headers, app_iter).
>>
>> It's relatively simple to provide adapters to and from this signature to
>> the WSGI 1.0 signature.
> 
> I think we also want to have a value you can yield from the app_iter to 
> explicitly request that the buffer be flushed, and that we should reopen 
> the discussion about values to be yielded to communicate with async 
> servers, indicating that the iterator should be paused pending input or 
> some other operation.

(this should probably be opened as a separate item from the signature 
change, as I don't think it relates much to that)

I'd rather not introduce new objects, since we don't have any new 
objects yet.  None is an obvious object, but it's vague in this context. 
  To me it feels more like a pause than a flush.  Flush really means 
*do* something, and None feels like the no-op, which is more like a pause.

I've become interested in using WSGI middleware as an HTTP translating 
proxy, so the async opportunities are of more interest to me now.  In 
part just the app_iter non-thread-affinity change would be helpful, I 
think.  Dealing with large request bodies is harder, I think, because 
those would have to be processed before the WSGI app returned.  But 
that's less concerning to me.

It seems like if yielding None from an app_iter meant "put me at the 
back of the queue" that would be a fairly simple and effective way of 
handling async for large (or slow) response bodies.  This wouldn't 
really work for the Twisted stuff where you keep a response open and 
trickle out data based on server-side events (because you can't control 
when you get back to the beginning of the queue), but otherwise it seems 
pretty good.  I suppose full control could be allowed if you could do 
something like return an object that could be part of the event loop 
somehow.  If we had some standard async-wrapping-key of some sort, 
perhaps.  For example (I say with no real knowledge of Deferred):

environ['wsgi.async_callback'] = EventMatcher
# in the app:
yield environ['wsgi.async_callback'](some_event)
# in the server:
for item in app_iter:
     if isinstance(item, EventMatcher):
         # queue up the app_iter, leaving it paused until something
         # matching that event happens

I feel somehow that it could be useful for intermediaries to be able to 
filter out this callback, and so a documented key (or keys) would be 
good.  But I can't quite place why I'd want to do that.  Well, except 
that any intermediary would have to be able to detect this kind of 
object and pass it back up.  So maybe instead of filtering it out of the 
environ, there needs to be some easy test that can be applied.

What the event object looks like ("some_event"), I have no idea.

> Ideally, this should be done in a way that's easy for middleware to 
> handle; a flush signal should be handled by the middleware *and* passed 
> up the chain, while any other async signals would be passed directly up 
> the chain (unless it's something like "pause for input" and the 
> middleware controls the input).
> 
> If we do this right, it should be easier to write middleware that works 
> correctly with respect to buffering, since the issues of flushing and 
> pausing now become explicit rather than implicit.  (This should make it 
> easier to teach/learn as well.)

In terms of buffering, I can't think of many cases where it would 
matter.  Either the middleware passes back the response with no changes, 
or it needs to consume the entire response body (and probably headers 
and maybe status) to do whatever transformation it needs to do.

Things like pauses and async signals would ideally be passed upstream, 
but flushes and content would all be consumed by the middleware.

>> It's not clear if the app_iter must be used in the same thread as the
>> application.  Since the application is blocking, presumably *it* must be
>> run all in one thread.  This should be more explicitly documented.
> 
> Definitely.  I think that we should not require thread affinity between 
> the application and the app_iter -- my feeling at this point is that 
> actual yielding is an edge case with respect to most WSGI apps.  The 
> common case WSGI application should be just returning a list or tuple 
> with a single string in it, and not doing any complex iteration.  
> Allowing the server more flexibility here is probably the better choice.
> 
> Indeed, I'm not sure we should require thread affinity across 
> invocations of app_iter.next().

It seems unlikely there'd be a need to move it between threads, but then 
it doesn't seem like there's much need for the application to have it 
all called in one thread either (i.e., if you move threads once, moving 
threads again shouldn't be a problem).

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
             | Write code, do good | http://topp.openplans.org/careers