[Web-SIG] PEP 333 and gzipping of responses

Tue Aug 11 07:54:25 CEST 2009

2009/8/11 James Y Knight <foom at fuhm.net>:
> On Aug 10, 2009, at 10:11 PM, James Bennett wrote:
>>
>> Earlier today I posted an article on my blog following up on some
>> discussions of WSGI
>
> I find it a bit odd that you again claim WSGI doesn't support chunked
> transfers after that was thoroughly explained here, already.

WSGI applications themselves shouldn't deal with chunked transfer
encoding. In other words, for a response, a WSGI application should
not format a response in chunked form as per HTTP specification. This
doesn't though stop the underlying web server from doing that where no
content length is supplied, but that is nothing to do with WSGI and a
completely separate concern only relevant to the web server layer. In
other words, out of scope of the WSGI specification. Robert has
already indicated that web server underlying CherryPy WSGI server does
this and I can say that Apache also does that, so mod_wsgi also by
virtue of that can generate chunked response content, albeit that it
isn't actually a feature of mod_wsgi.

As for request content, it is also the concern of the underlying web
server and not the WSGI application. That said, the way the WSGI
specification is drafted makes it impossible for a WSGI application to
handle a request which uses chunked content directly. This is because
wsgi.input isn't required to use an empty string as end of input
sentinel. This means one cannot just read until all request content is
exhausted. Instead, it is required to rely on CONTENT_LENGTH to
determine how much an application can actually read. With chunked
request content though, there is no CONTENT_LENGTH. The WSGI
specification follows CGI though and so if CONTENT_LENGTH is not
supplied you are supposed to assume that CONTENT_LENGTH is 0. As such,
there is no way to indicate that input can be present but is of
unknown length and so chunked request content cannot be handled
directly by a WSGI compliant application.

In the web server that underlies CherryPy WSGI server, Robert tries to
address this by reading in all input for chunked request up front and
determining CONTENT_LENGTH before passing it to the WSGI application.
This prohibits WSGI application from directly streaming request
content and leads into issues about what to do if request content is
large. If WSGI application is streaming it itself, it could determine
that it should halt if finding more than it wants to deal with. By
doing that in web server though, WSGI application doesn't have that
level of control.

In Apache/mod_wsgi, for <3.0 it will reject chunked requests outright.
In 3.0+ you will be able to optionally specify a directive which will
allow chunked request content, but you have to consciously step
outside of bounds of WSGI and ignore CONTENT_LENGTH and instead read
to end of input if you want to handle chunked request content. Thus,
your application wouldn't be WSGI compliant. Some number of users
accept this though, as it is the only way to handle uploads from some
mobile phones, which use chunked request content for large uploads.

This issue of there being no way to handle content of unknown length
also means you cannot have mutating input filters. This means you
cannot use compression on request content and use mod_deflate in
Apache to uncompress it as the resulting content will normally be of
different length to that specified by CONTENT_LENGTH, which will be
the compressed length.

Now, I have described CherryPy WSGI server as being layered, ie., web
server and then WSGI adapter. I know that it may not be that clear cut
and they are one in the same, but logically, there is a split, even if
the code is much intertwined. I am sure Robert will correct me if my
understanding is wrong. :-)

Graham