[Twisted-web] Re: [Web-SIG] WSGI woes

Thu Sep 16 23:41:31 CEST 2004

[Alan Kennedy]
 >> I suppose I'm talking about the server "pushing" the input through
 >> the middleware stack, whereas you're talking about the application at
 >> the stop of the stack "pulling" the data up through the stack. Is
 >> that right?

[Phillip J. Eby]
 > That's correct, and that's what I'm trying to avoid if at all
 > possible, because it enormously complicates middleware, to the sole
 > benefit of asynchronous apps -- that mostly aren't going to be
 > portable anyway.

Hmmm. Perhaps I'll resort to explaining my idea through code rather than 
text. Here is my take on a putative blocking *and* asynchronous rot-13 
stream encoder.

But before showing you the blocking and async one, I want to show what I 
think the blocking one would look like

class blocking_rot13_streamer:

   def __init__(self, environ, start_response):
     self.in_stream = environ['wsgi.input']
     start_response("200 OK", [('context-type', 'text/plain-rot13')])

   def __iter__(self):
     return self

   def next(self):
     try:
       return self.in_stream.read().encode('rot-13')
     except EndOfStream:
       raise StopIteration

This looks nice and simple to me. The one that works in both async mode 
and blocking mode looks like this

class rot13_streamer:

   def __init__(self, environ, start_response):
     self.in_stream = environ['wsgi.input']
     self.buffer = []
     self.end_of_stream = False
     if environ.has_key('wsgi.async_input_handler'):
       self.async = True
       environ['wsgi.async_input_handler'](self.input_handler)
     else:
       self.async = False
     self.pause_output = environ['wsgi.pause_output']
     start_response("200 OK", [('context-type', 'text/plain-rot13')])

   def input_handler(self):
     try:
       data = self.environ['wsgi.input'].read()
       self.buffer.append(data)
       if self.resume:
         self.resume()
         self.resume = None # Are resumes one-hit or "re-entrant"?
     except EndOfStream:
       self.end_of_stream = True

   def __iter__(self):
     return self

   def next(self):
     if async:
       if self.buffer:
         return self.buffer.pop().encode('rot-13')
       else:
         if self.end_of_stream:
           raise StopIteration
         else:
           self.resume = self.pause_output()
           return ""
     else:
       try:
         return self.in_stream.read().encode('rot-13')
       except EndOfStream:
         raise StopIteration

In this way, there could be a middleware component below the 
rot13_streamer in the stack that, say, does chunked_transfer encoding 
and decoding. It would be the same in form as the above, except that it 
would

1. Change the environ entry for 'wsgi.async_input_handler' to be its own 
callable that records the callback for the next layer up in the stack, 
the rot13_streamer.input_handler.

2. Create its own buffer, into which it will store chunks decoded from 
the input stream. This buffer, e.g. a StringIO, then replaces 
'wsgi.input' in the environ passed to next middleware component up.

3. When chunks arrive from the client, the server calls the dechunker 
input_handler. This reads the (possibly partial) chunk from the stream, 
decodes it and stores it in its StringIO buffer.

4. When it has a complete chunk it calls the input_handler of the next 
component in the stack, which will then read the decoded chunk from its 
wsgi.input stream, i.e. the dechunkers StringIO.

I think that this proposed approach is clean, and not overly complex for 
async or blocking programmers to handle.

But I think we do have to cleanly separate the two. I think there are 
problems associated with trying to run *all* components seamlessly 
across async or blocking servers. I think that middleware components 
that are always going to behave correctly in an async situation will 
have to be designed like that from the ground up. It's dangerous to take 
components written in a blocking environment and run them in an async 
environment.

And lastly, if it is desired to spin jobs into a different thread, e.g. 
the rot-13 job above, then that should be a middleware concern, not the 
WSGI server's. So if a twisted component wants to pass a job to a 
service thread, some other twisted comonent lower down the stack, 
possibly the framework itself, must have already created the 
threads/queues to enable this. The twisted rot-13 component would then 
have very thin methods (run from the server's main thread) which 
interact with the twisted space i.e. transferring data and receiving 
data back through queues, and layer WSGI semantics on those 
interactions, i.e. pause_output, yield result, yield empty_string, etc.

When I described your approach as "pulling data up the stack", I saw a 
bigger difference between the two approaches. I'm thinking now that 
there is little difference between our proposals, except that in mine 
it's the bottom component that gets notified of the input by the server, 
and in yours it's the top component. Though I suppose having the top 
component pulling input from an iterator chain mirrors nicely the 
situation where the server pulls output from an iterator chain.

And my approach basically entails a bunch of nested calls, which might 
be less efficient elegant than if, say, generators were used in an input 
processing chain.

You're right again Phillip :-)

Regards,

Alan.