[Web-SIG] WSGI Content-Length issues.

Wed Jan 9 08:33:56 CET 2008

Graham Dumpleton wrote:
> Can the group mind provide some clarification on the following please.
> 
> 1. The WSGI specification does not require that a WSGI 
> adapter provide an EOF indicator if an attempt is made to 
> read more data from wsgi.input than defined by request 
> Content-Length.

This is not a problem when the Content-Length header is provided in the
request, because the application should never read more than
<Content-Length> bytes.

RFC 2616 says "The presence of a message-body in a request is signaled
by the inclusion of a Content-Length or Transfer-Encoding header field
in the request's message-headers." If those headers are missing, then
the application has to assume there is no message body, and the WSGI
gateway is free to dispose of any message body it can detect.

I do agree that the handling of chunked request bodies is not ideal; the
current wording implies that the gateway must buffer the entire chunked
request body until it can calculate the Content-Length, before calling
the application object. This pretty much defeats the purpose of chunked
encoding. On the other hand, it is a pretty minor issue because chunked
request bodies are very rare.

> Is though a WSGI adapter required to 
> explicitly discard any request content which wasn't consumed 
> or is the WSGI applications responsibility to ensure that all 
> request content up to the length specified is always consumed?

Given the existing body of applications that ignore extraneous message
bodies, it only makes sense to put the burden on the gateway. In
particular, a request entity is allowed syntactically on a GET request,
but any such entity must not effect the semantics of the request--that
is, an application should always ignore it. And, I've never seen any
WSGI applications that attempt to consume request entities on a GET
request. It is pretty common to ignore the request entities on PUT and
POST requests too (e.g. for conditional requests).

> I have seen some reports to suggest that some WSGI 
> adapter/servers do not discard unread content up to 
> Content-Length, resulting in the problem that if Keep-Alive 
> was enabled that the server may incorrectly try and interpret 
> the remaining content as the header of the next request on 
> that same socket connection.

If the WSGI gateway cannot detect the end of one request and the start
of the next one, regardless of what the application does, then it is
faulty. That is the primary reason that requires Content-Length or
Transfer-Encoding headers on messages with entity bodies. The WSGI spec.
could be more explicit, I don't think anybody is going to stand up and
say "I refuse to parse requests correctly because PEP 333 doesn't
explicitly require me to." I think we just need to report these bugs to
the gateway authors and let (help) them fix them.

> 2. If a WSGI application sets a Content-Length in a response 
> and then returns request content of a greater length, should 
> the WSGI adapter attempt to discard any additional output 
> beyond the length set by the application or just pass it 
> through? What obligations do WSGI middleware have in this respect?
>
> If the answer is that the WSGI adapter shouldn't care and 
> should just pass everything through, then would it be seen as 
> at least prudent that the WSGI adapter log a warning message 
> that the returned response content differs in length to the 
> specified Content-Length? Same applies where a WSGI 
> application finished successfully but didn't return as much 
> output as it said it was going to.

If the application wants well-defined behavior, then it should always
ensure that it sends a response body that is exactly <Content-Length>
bytes long. That is because all the front-end web servers, proxy
servers, and client applications that process the response depend on the
response being compliant with RFC 2616. When the Content-Length header
is wrong, the results are unpredictable, regardless of what the WSGI
gateway tries to do. When you have to choose between being compliant
with RFC 2616 or being compliant with PEP 333, always choose RFC 2616.

Consequently, the server is free to do whatever it wants when the
Content-Length is wrong: it can truncate overly long entities, or drop
the connection entirely. Such results are likely to occur somewhere
along the way to the client anyway. The application shouldn't expect a
successful or even consistent result. 

(Note that when I say "the Content-Length is wrong" I am not referring
to the case where the application does not include a Content-Length
header at all.)

> 3. Similarly, where a WSGI adapter supports wsgi.file_wrapper 
> and the Content-Length header was set in the response, should 
> the WSGI adapter send only at most that amount of data? This 
> question applies whether or not the WSGI adapter is able to 
> optimise the sending of the response because of the presence 
> of fileno() or other platform specific feature which would 
> facilitate such optimisations.

The specification is clear about this: "The semantics [...] should be
the same as if the application had returned iter(filelike.read, ''). In
other words, transmission should begin at the current position within
the "file" at the time that transmission begins, and continue until the
end is reached." However, I think this is truly an error in the
specification--the gateway should not be required to send more than
<Content-Length> bytes if the application set the Content-Length header.
Really, this is just a special case of the situation described above,
where the application is trying to send a larger (or smaller) body than
it claimed in the Content-Length header.

Again, when you have to choose between being compliant with RFC 2616 or
being compliant with PEP 333, always choose RFC 2616.

> 4. Where a WSGI adapter supports wsgi.file_wrapper and the 
> Content-Length header was NOT set in the response, where 
> optimisations are being performed and the WSGI adapter can 
> (or must in order to send
> it) calculate the length of the output, can the WSGI adapter 
> add its own Content-Length header indicating the actual 
> amount of response content sent.

PEP 333 already clearly states that the WSGI gateway can add a
Content-Length header whenenever it wants to, if the application didn't
supply one: "[...T]he server or gateway may be able to either generate a
Content-Length header, or at least avoid the need to close the client
connection."

I do think think that it is a good idea to include these clarifications
in (an addendum to) the WSGI spec, as these are all issues that are
often overlooked in implementations.

- Brian