[Web-SIG] WSGI input filter that changes content length.

Mon Jan 15 00:22:07 CET 2007

How does one implement in WSGI an input filter that manipulates the request
body in such a way that the effective content length would be changed?

In the WSGI PEP it says:

  CONTENT_LENGTH
    The contents of any Content-Length fields in the HTTP request. May be
    empty or absent.

Also, it says:

  The server is not required to read past the client's specified Content-
  Length, and is allowed to simulate an end-of-file condition if the
  application attempts to read past that point. The application should not
  attempt to read more data than is specified by the CONTENT_LENGTH
  variable.

Is the absence of the CONTENT_LENGTH meant to imply that the content length is
actually 0, ie., no content, or is it allowed to indicate that the application
should perform a read() with no argument to get all data that may be present
and from the data returned imply the actual content length?

The problem I am trying to address here is how one might implement using WSGI a
decompression filter for the body of a request. Ie., where "Content-Encoding:
gzip" has been specified.

In this situation when start_response() for the middleware is called, it will
know that the content length is likely to change but not what the new content
length will actually be. As a consequence, the only thing it can really do at
that point is zap the CONTENT_LENGTH to indicate that the value can't actually
be trusted.

The only other option would be to have at the start_response() phase the
middleware actually read the data in, decompress it and buffer it. Having done
this it will know what the new content length value would be and could change
CONTENT_LENGTH before calling start_response() on the downstream application.
Doing this has various downsides though. The first is that the read can trigger
a 100 continue to be sent back to the client if HTTP/1.1 is used before the
real consumer application is ready to start using the data. The application may
eventually decide though before even attempting to consume the data that it
wants to reject the request, but at that point is too late in as much as the
data has already been consumed by the middleware with the client unnecessarily
having sent the data. The other downside is the need to buffer the data. If it
is a small amount of data then in memory buffering may suffice, but if it huge
then disk based caching would be necessary.

So, how is one meant to deal with this in WSGI?

Graham