[Web-SIG] HEAD requests, WSGI gateways, and middleware

Sun Jan 27 23:59:03 CET 2008

On 25/01/2008, Brian Smith <brian at briansmith.org> wrote:
> My application correctly responds to HEAD requests as-is. However, it doesn't work with middleware that sets headers based on the content of the response body.
>
> For example, a gateway or middleware that sets ETag based on an checksum, Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong results by default. Right now, my applications assume that any such gateway or the first such middleware will change environ["REQUEST_METHOD"] from "HEAD" to "GET" before the application is invoked, and discard the response body that the application generates.
>
> However, many gateways and middleware do not do this, and PEP 333 doesn't have anything to say about it. As a result, a 100% WSGI 1.0-compliant application is not portable between gateways.
>
> I suggest that a revision of PEP 333 should require the following behavior:
>
> 1. WSGI gateways must always set environ["REQUEST_METHOD"] to "GET" for HEAD requests. Middleware and applications will not be able to detect the difference between GET and HEAD requests.
>
> 2. For a HEAD request, A WSGI gateway must not iterate through the response iterable, but it must call the response iterable's close() method, if any. It must not send any output that was written via start_response(...).write() either. Consequently, WSGI applications must work correctly, and must not leak resources, when their output is not iterated; an application should not signal or log an error if the iterable's close() method is invoked without any iteration taking place.

For this discussion, which I see that there was no further followups,
I see no choice but in Apache mod_wsgi to do number 1 above. It is the
only way that one can guarantee that things will work properly due to
the fact that Apache has its own output filtering system whereby
output headers can be set based on the actual request content. If not
done then the result of GET and HEAD may not be the same.

As to number 2 (with later clarification), I will defer trying to do
any optimisation by virtue of skipping processing of the iterable.
This is in part because of the issue of whether a WSGI adapter is
allowed to skip processing the iterable, but also because it gets a
bit tricky in Apache mod_wsgi daemon mode as you need to pass across
information from Apache child process to daemon process indicating
whether there are any output filters registered in the Apache child
process. Only knowing that could you skip processing the iterable in
the daemon process and not generate any content.

Overall I think the basic problem here is that in WSGI it likes to
think it is the sole arbiter on what the response headers will be. In
practice this may not be the case where one is bridging from a true
web server which is capable of doing a lot of other stuff. For a WSGI
adapter where this can occur, seems there isn't a choice for it to
change all HEAD requests to GET requests.

So, although I can fix Apache mod_wsgi so that HEAD works, this will
not help with other Apache solutions such as CGI, SCGI, FASTCGI, AJP
etc. For those the WSGI adapters used will have to be separately fixed
to do a similar thing.

Graham