[Web-SIG] HEAD requests, WSGI gateways, and middleware

Graham Dumpleton graham.dumpleton at gmail.com
Fri Jan 25 04:43:54 CET 2008


On 25/01/2008, Brian Smith <brian at briansmith.org> wrote:
> Graham Dumpleton wrote:
> > The issue here is that Apache has its own output filtering
> > system where filters can set headers based on the actual
> > content. Because of this, any output filter must always
> > receive the response content regardless of whether the
> > request is a GET or HEAD. If an application handler tries to
> > optimise things and not return the content, then these output
> > filters may generate different headers for a HEAD request
> > than a GET request, thereby violating the requirement that
> > they should actually be the same.
> >
> > Note that response content is still thrown away for a HEAD
> > request, it is just done at the very last moment after all
> > Apache output filters have processed the data.
>
> Right, that is exactly what I am saying.

To quote, in 2 you said:

"""For a HEAD request, A WSGI gateway must not iterate through the
response iterable"""

I was presuming that this was saying that the WSGI gateway should do
this as well as changing the REQUEST_METHOD actually sent to the WSGI
application to GET.

If Apache mod_wsgi (the WSGI gateway) does then do this, ie., didn't
iterate through the iterable and therefore didn't return the content
through to Apache, it would as explained cause traditional Apache
output filters to potentially yield incorrect results. This is what I
am highlighting.

So Apache mod_wsgi couldn't avoid processing the iterable, unless as
you allude to with how internals of how Apache is used to implement
wsgi.file_wrapper support, that mod_wsgi similarly detected when no
Apache output filters are registered that could add additional headers
and skip the processing.

Some clarification in 2 is perhaps required.

> In Apache's documentation, it
> says that every handler should include the response entity for HEAD
> requests, so that output filters can process the output. However, there
> is nothing in PEP 333 that talks about this behavior. So, the only
> reasonable thing to do is to assume that, when environ["REQUEST_METHOD"]
> == "HEAD", no response entity should be generated. Do we all agree that
> the following application is correct?:
>
>         def application(env, start_response):
>                 start_response("200 OK",
>                         [("Content-Length", "10000")])
>                 if env["REQUEST_METHOD"] == "HEAD":
>                         return []
>                 else:
>                         return ["a"*10000]
>
> Because of web servers' output filters, if the WSGI gateway is an web
> server module or a [Fast]CGI script, then it needs to lie and tell the
> application that the request is a "GET", not a "HEAD." Otherwise, the
> application will see that the request method is "HEAD" and suppress its
> own response entity, as the HTTP specification requires, and the output
> filters will fail. The only time it is reasonable for the gateway to
> pass "HEAD" as the request method is when it knows that there are not
> any output filters/middleware that depend on the response entity.
> Usually that is only possible in standalone web servers like CherryPy's
> or Paste's.
>
> I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets
> env["REQUEST_METHOD"] to "HEAD" for HEAD requests.

It just passes whatever Apache sets up as the CGI environment.

> When mod_deflate is
> enabled, a HEAD request returns "Content-Length: 20", and a GET request
> returns "Content-Length: 46". However, it is supposed to be
> "Content-Length: 46" in both cases.

Is this with your sample application which detects HEAD and doesn't
return anything if it is found. In other words, it is driven by what
your application is actually returning?

Am not saying your application is wrong or right, am just trying to
determine if you are saying that there is a problem in Apache mod_wsgi
separate to the what it is passing as REQUEST_METHOD to cause that.

> The CGI WSGI gateway in PEP 333 gets
> it wrong too when mod_deflate is used.
>
> Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge
> optimization for this: if no Apache output filters need the response
> entity, and wsgi.file_wrapper is used, then the file will never be read
> off the disk.

Hmmm, I didn't actually look under the covers of what Apache did when
I used its file bucket for that. Worked out better than I expected
then. :-)

> But, if wsgi.file_wrapper is not used, then the entire
> file has to be read off the disk through the application's output
> iterable for no reason. It would be nice if the non-file_wrapper case
> worked as well as the file_wrapper case.
>
> If you put all this together, you end up with the rules that I outlined
> in my previous message:

Except as pointed out that 2 suggests I should never pass on content
from iterable for HEAD, where in practice I still have to if there are
output filters.

Pardon me if I am not understanding very well, I did not get much
sleep last night because of baby and my head hurts. :-(

Graham

> > 1. WSGI gateways must always set environ["REQUEST_METHOD"] to
> >    "GET" for HEAD requests. Middleware and applications will
> >    not be able to detect the difference between GET and HEAD
> >    requests.
> >
> > 2. For a HEAD request, A WSGI gateway must not iterate
> >    through the response iterable, but it must call the
> >    response iterable's close() method, if any. It must not
> >    send any output that was written via
> >    start_response(...).write() either. Consequently,
> >    WSGI applications must work correctly, and must not
> >    leak resources, when their output is not iterated;
> >    an application should not signal or log an error if
> >    the iterable's close() method is invoked without any
> >    iteration taking place.
>
> - Brian
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>


More information about the Web-SIG mailing list