[Web-SIG] wsgi and generators (was Re: WSGI and start_response)

Sat Apr 10 19:52:00 CEST 2010

At 02:04 PM 4/10/2010 +0100, Chris Dent wrote:
>I realize I'm able to build up a complete string or yield via a
>generator, or a whole bunch of various ways to accomplish things
>(which is part of why I like WSGI: that content is just an iterator,
>that's a good thing) so I'm not looking for a statement of what is or
>isn't possible, but rather opinions. Why is yielding lots of moderately
>sized strings *very bad*? Why is it _not_ very bad (as presumably
>others think)?

How bad it is depends a lot on the specific middleware, server 
architecture, OS, and what else is running on the machine.  The more 
layers of architecture you have, the worse the overhead is going to be.

The main reason, though, is that alternating control between your app 
and the server means increased request lifetime and worsened average 
request completion latency.

Imagine that I have five tasks to work on right now.  Let us say each 
takes five units of time to complete.  If I have five units of time 
right now, I can either finish one task now, or partially finish 
five.  If I work on them in an interleaved way, *none* of the tasks 
will be done until twenty-five units have elapsed, and so all tasks 
will have a completion latency of 25 units.

If I work on them one at a time, however, then one task will be done 
in 5 units, the next in 10, and so on -- for an average latency of 
only 15 units.  And that is *not* counting any task switching overhead.

But it's *worse* than that, because by multitasking, my task queue 
has five things in it the whole time...  so I am using more memory 
and have more management overhead, as well as task switching overhead.

If you translate this to the architecture of a web application, where 
the "work" is the server serving up bytes produced by the 
application, then you will see that if the application serves up 
small chunks, the web server is effectively forced to multitask, and 
keep more application instances simultaneously running, with lowered 
latency, increased memory usage, etc.

However, if the application hands either its entire output to the 
server, then the "task" is already *done* -- the server doesn't need 
the thread or child process for that app anymore, and can have it do 
something else while the I/O is happening.  The OS is in a better 
position to interleave its own I/O with the app's computation, and 
the overall request latency is reduced.

Is this a big emergency if your server's mostly idle?  Nope.  Is it a 
problem if you're writing a CGI program or some other direct API that 
doesn't automatically flush I/O?  Not at all.  I/O buffering works 
just fine for making sure that the tasks are handed off in bigger chunks.

But if you're coding up a WSGI framework, you don't really want to 
have it sending tiny chunks of data up a stack of middleware, because 
WSGI doesn't *have* any buffering, and each chunk is supposed to be 
sent *immediately*.

Well-written web frameworks usually do some degree of buffering 
already, for API and performance reasons, so for simplicity's sake, 
WSGI was spec'd assuming that applications would send data in 
already-buffered chunks.

(Specifically, the simplicity of not needing to have an explicit 
flushing API, which would otherwise have been necessary if middleware 
and servers were allowed to buffer the data, too.)