[Web-SIG] HEAD requests, WSGI gateways, and middleware

Fri Jan 25 05:05:59 CET 2008

Graham Dumpleton wrote:
> To quote, in 2 you said:
> 
> """For a HEAD request, A WSGI gateway must not iterate 
> through the response iterable"""
> 
> I was presuming that this was saying that the WSGI gateway 
> should do this as well as changing the REQUEST_METHOD 
> actually sent to the WSGI application to GET.

I misstated it. It should be "For a HEAD request, A WSGI gateway *may*
skip iterating through the response iterable". That is, if the gateway
can detect that the request entity isn't going to change the final set
of headers in any way, it can skip the iteration.

> If Apache mod_wsgi (the WSGI gateway) does then do this, ie., 
> didn't iterate through the iterable and therefore didn't 
> return the content through to Apache, it would as explained 
> cause traditional Apache output filters to potentially yield 
> incorrect results. This is what I am highlighting.
> 
> So Apache mod_wsgi couldn't avoid processing the iterable, 
> unless as you allude to with how internals of how Apache is 
> used to implement wsgi.file_wrapper support, that mod_wsgi 
> similarly detected when no Apache output filters are 
> registered that could add additional headers and skip the processing.

Right, my idea was that mod_wsgi could implement a new bucket type,
where the iteration is done if and only if some output filter reads from
the bucket. But, if no output filters read from the bucket, then the
iteration would never happen.

> >         def application(env, start_response):
> >                 start_response("200 OK",
> >                         [("Content-Length", "10000")])
> >                 if env["REQUEST_METHOD"] == "HEAD":
> >                         return []
> >                 else:
> >                         return ["a"*10000]
> >
> > I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets 
> > env["REQUEST_METHOD"] to "HEAD" for HEAD requests.
> 
> It just passes whatever Apache sets up as the CGI environment.
> 
> > When mod_deflate is
> > enabled, a HEAD request returns "Content-Length: 20", and a GET 
> > request returns "Content-Length: 46". However, it is supposed to be
> > "Content-Length: 46" in both cases.
> 
> Is this with your sample application which detects HEAD and 
> doesn't return anything if it is found. In other words, it is 
> driven by what your application is actually returning?

Yes, these results are from the program above. Those 10,000 A's compress
down to 26 bytes, plus the 20 byte header. For the HEAD case,
mod_deflate compresses 0 bytes to 0 bytes and adds a 20 byte header.

> > Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge 
> > optimization for this: if no Apache output filters need the 
> > response entity, and wsgi.file_wrapper is used, then the file
> > will never be read off the disk.
> 
> Hmmm, I didn't actually look under the covers of what Apache 
> did when I used its file bucket for that. Worked out better 
> than I expected then. :-)

I will double-check, but I believe that in the embedded mode, the file
never gets read at all, when there are no output filters processing the
output. I will bring it up on the mod_wsgi list. 

> Except as pointed out that 2 suggests I should never pass on 
> content from iterable for HEAD, where in practice I still 
> have to if there are output filters.
>
> Pardon me if I am not understanding very well, I did not get 
> much sleep last night because of baby and my head hurts. :-(

Not your (or your daughter's) fault; I wrote something different from
what I meant. I hope tonight is easier on you. Good luck!

Regards,
Brian