[Web-SIG] WSGI Open Space @ PyCon.

Sat Mar 28 13:18:44 CET 2009

On Sat, Mar 28, 2009 at 2:53 AM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> 2009/3/28 Mark Ramm <mark.mchristensen at gmail.com>:
>> My thought is that we should do a couple things to the wsgi standard,
>> and then anything like the lifecycle methods gets addresse,d it should
>> be pushed into a "container" standard or something.
>>
>> I think Robert Brewer's WSGI Service Bus proposal that he made a
>> couple years ago at PyCon needs a new name, but it does provide a good
>> start on the lifecycle stuff.
>
> From memory, my concern over that specification was that it sort of
> assumed that applications were all preloaded. I am not sure how well
> it would work where lazy loading is performed and where there are
> multiple WSGI applications running in a interpreter but where they
> weren't themselves mounted within a WSGI application, but through
> external mechanisms dictated by the WSGI hosting mechanism.
>
>> As for WSGI itself, we should make a couple of smaller changes which I
>> think will likely be a bit easier to quantify and agree on. I'm sure
>> lots more folks from yesterday's discussion will chip in here, but
>> this is my take on the things we discussed.
>>
>> 1) We should drop the start_response callable, and return a three
>> member tupple from the wsgi callable:
>>
>>   def wsgi2app(environ):
>>         ....
>>         return (status_code, headers, response_iterator)
>>
>> 2) We should turn wsgi.input into an iterator rather than a somewhat
>> file-like object.   WSGI middleware that reads part of the wsgi.input
>> iterator should make sure to restore it using itertools.chain or
>> replace it with whatever.  If there's a content length specified from
>> the server the middleware should be responsible for maintaining or
>> deleting that information as nessisary.   Content length of 0 is
>> allowed and means there's no data, whereas an unspecified or content
>> length, indicates that the value is unknown.  This will create a good
>> symmetry between the input and output methods, and seems like a good
>> comprimise between flexibility for middleware creators, and ease of
>> use for consumers.
>
> The problem with an iterator/generator is how do you control the size
> of the chunks of data returned. An iterator also probably isn't going
> to make chunked request content any easier to handle.
>
> It may be easier to change how people use the wsgi.input that exists
> now. First off allow one to say:
>
>  wsgi.input.read()
>
> to get all input, rather than passing CONTENT_LENGTH as argument.
>
> For consume all data in chunks until exhausted, require a proper eof
> indicator in the form of an empty string read, then can say:
>
>  s = wsgi.input.read(BLOCKSIZE)
>  while s:
>    # do something with 's'
>    s = wsgi.input.read(BLOCKSIZE)
>
> That way you don't have to make around with checking how much you have read.
>
> This does require that an exception be raised if client closes
> connection before all data expected was read.
>
> The question thus is, what would be the actual benefits of changing to
> an iterator/generator.
>
>> 3) The server should encode the headers and include explicit
>> information about the encoding in the wsgi environ variable.  So that
>> any assumptions about what they bytes in the headers represent is made
>> explicit.
>
> That could be fun. For Apache/mod_wsgi at least you are in control of
> the conversion. In Python 3.0 and CGI/WSGI the os.environ variables
> are already unicode strings because they were converted by Python. How
> this is done varies between UNIX and Windows platforms.
>
>> I think we're all very sold on item 1, and items 2 and 3 require more
>> thinking, but seemed reasonable to those present at the discussion
>> this afternoon.    Hopefully we'll be meeting again on Saturday and
>> will be able to continue to think through this stuff and push this all
>> forward some more.
>>
>> I'm sure there also be several other minor tweeks to the spec like:
>
> Yeah, like defining how wsgi.file_wrapper should behave where response
> Content-Length is defined but wrapped file actually provides more
> content than that.
>
>> * Not de-encoding encoded slashes in path strings, so that
>> applications can tell the difference between path separators and
>> encoded slashes.
>
> When sitting on top of Apache, whether it be mod_wsgi, fastcgi, scgi,
> ajp or CGI, you don't really have much choice, you get what Apache
> gives you.

Which is fine, I guess, but it does make it impossible to tell the
difference between real slashes and encoded ones in WSGI application
code.  I would love it if there were some way around that.

>> * adding a "ClientWentAway" exception that indicates that wsgi.imput
>> has not been officially exhausted, but that the client went away before
>> wsgi.input was fully populated.
>
> The problem with an exception is what namespace do you put it in. You
> almost need to have the type as part of the WSGI environment. You may
> just be better standardising it by saying that an IOError must be
> raised and leave it at that. At the moment most stuff doesn't even pay
> attention to the fact that an exception could occur for some WSGI
> adapters.

That would be fine with me.   The issue is definitely not the
exception name, but the fact that one can be raised/caught in a
standard way.