[Web-SIG] Internal redirect using Location with absolute path and status of 200.

Sat May 26 06:33:38 CEST 2007

On 25/05/07, Ian Bicking <ianb at colorstudy.com> wrote:
> Graham Dumpleton wrote:
> > On 24/05/07, Ian Bicking <ianb at colorstudy.com> wrote:
> >> Graham Dumpleton wrote:
> >> > Does anyone think this would be nice extension for a WSGI adapter
> >> > written against current specification to implement even if not
> >> > necessarily portable?
> >>
> >> Eh.  In the context of mod_wsgi, I think it would be more interesting to
> >> provide a WSGI application that called back into Apache (basically
> >> wrapping Apache's normal subrequest machinery in a WSGI exterior).
> >
> > I was trying to avoid as much as possible having mod_wsgi provide any
> > sort of hooks which would allow one to perform actions against
> > internals of Apache. I had two reasons for this.
>
> This is a much more constrained hook into Apache than what mod_python
> provides.  For instance, you could provide much the same thing, but
> where subrequests actually go out over HTTP.  There's quite a bit of
> data you couldn't share over HTTP, so it's not entirely equivalent, but
> it's still pretty close (especially if there was something on the Apache
> side to fix up the slightly-richer-than-HTTP environment based on
> special headers).

If I am going to be providing any sort of way of interacting with
Apache internals though, I don't want to be in the business of having
to write custom wrappers for performing specific tasks. This would
just turn mod_wsgi into just another framework rather than being just
the absolute minimal WSGI adapter it is.

When I did look at allowing it to be more than just an adapter, the
approach I looked at for providing what you want to do is to simply
pass through the SWIG wrapping for the Apache request object
(request_rec) in the WSGI environment.

This wouldn't be something that I saw happening by default though as
by doing so it would place a dependency of mod_wsgi on all the SWIG
bindings for Apache and would also be a way of circumventing the
locking down of what the user is allowed to do with mod_wsgi. Ie., am
trying to make mod_wsgi be as safe as possible so that web hosting
companies might consider looking at it for use in shared hosting
environments. If it were to have just as many problems and unknowns as
mod_python, the whole exercise in writing it would be a waste of time.

All this means is that to enable the feature you would first need to
specify a configuration directive in the main Apache configuration
something like:

  WSGIExtensions RequestRec

This would just indicate that passing a request object would be
allowed, you would still then need to enable it for a specific
application (part of the URL namespace).

  WSGIPassRequestRec On

That done, the request object could be accessed as
'apache.request_rec' in the WSGI environment.

Although only the request object is being passed that is enough, as
the separate SWIG bindings for the Apache API, which would not even be
a part of mod_wsgi but a separate package, would then provide
everything else. The SWIG bindings would though just be a direct
mapping to the C API with no real wrapping giving it a Pythonic feel.

Thus for example your internal redirect would be written something like:

  from apache.http_request import *

  def application(environ, start_response):

    r = environ['apache.request_rec']
    ap_internal_redirect('/some/other/path', r)

    # Dummy WSGI response as redirect already sent response.
    start_response('200 OK', [])
    return []

If desired, people could then write if they wish WSGI component
objects which wrap up such low level Apache API calls to do things.
One example is obviously an internal redirect, but another may be use
apache.mod_ssl.ssl_var_lookup() to lookup specific properties of a
client side SSL certificate which wouldn't otherwise be available to
an WSGI application.

For cases like accessing SSL certificate information using the API
there wouldn't be a big problem, but one problem with something like
internal redirects is that the way WSGI applications return a response
isn't a direct mapping to the lower level Apache handler response but
is more complicated than that. Thus you end up having to use some sort
of dummy response which wouldn't add to what a sub request may have
already returned. Alternatively, you have to provide other stuff in
the WSGI environment which the application could use in some way to
raise an exception that would then be caught by mod_wsgi and taken to
mean that the normal WSGI application response processing doesn't need
to be done, but that a normal Apache API status value of OK still be
returned (different to HTTP_OK).

In other words the mismatch in the APIs and that the WSGI interface is
not as rich as the Apache handler API as far as how a handler response
and the HTTP status can be indicated can make it all just a bit messy.
Also, some of the things one can do through the Apache API are
stepping outside of the flow of operations with WSGI applications.

Just as comparison, if using just the Apache API direct, it would have
been written as:

  from apache.httpd import *
  from apache.http_request import *

  def handler(r):

    ap_internal_redirect('/some/other/path', r)

    return DONE

Important here is that the value DONE is being returned, which
indicates to Apache that a complete response has been provided, by
virtue of the sub request, and that for the parent handler processing
nothing more should be done if there did so happen to be further
handler registered for the response handler phase.

This is in contrast to a standard Apache type handler which might have been:

  from apache.httpd import *
  from apache.http_protocol import *

  def handler(r):

    content = 'hello world!\n'

    ap_set_content_type(r, 'text/plain')
    ap_set_content_length(r, len(content))

    ap_rwrite(content, r)

    return OK

So in practice they are quite different worlds and my feeling is that
allowing WSGI applications to call back into the Apache internals may
just cause more problems than its worth, especially since you can't
properly represent the low level Apache handler response in the
response to the WSGI application.

Another mismatch and one I am already having to contend with with
mod_wsgi is that HTTP error status can be indicated in two ways. the
first is:

  from apache.httpd import *
  from apache.http_protocol import *

  def handler(r):

    content = 'NOT FOUND. GO AWAY.\n'

    r.status = HTTP_NOT_FOUND
    ap_set_content_type(r, 'text/plain')
    ap_set_content_length(r, len(content))

    ap_rwrite(content, r)

    return OK

By returning OK here and using r.status to indicate the HTTP status
code, it tells Apache that I have already provided a response body and
thus it shouldn't try and provide one through processing ErrorDocument
directives or by adding its own default.

The other option is:

  from apache.httpd import *

  def handler(r):

    return HTTP_NOT_FOUND

In this case, because the HTTP status code was returned as the actual
response, it indicates that I haven't provided a response body and
thus Apache should instead try and provide one.

In WSGI, when something like:

    start_response('404 Not Found', [])

is used, it is still up to the WSGI application to provide a response
body with the content to the page to be displayed in the browser. If
it doesn't then the browser will prevent a black page.

What I don't know is if mod_wsgi should be trying to be smart and pick
up where a HTTP error response is returned but where there is no
response body and instead of doing equivalent of setting r.status =
HTTP_NOT_FOUND and returning OK, just return HTTP_NOT_FOUND as result
as in second example, such that Apache can instead get the chance of
providing an error page since the WSGI application didn't actually
provide one.

I know I am rambling, but if you have got this far and followed what I
am going on about in the last example, I might ask what you do about
error pages. How do you ensure a consistent error page layout across a
whole WSGI application containing many disparate components?

Do you try and pass down through the WSGI environment some special
hook an application can call to generate errors pages with the same
style, but then cause a dependency on this hook existing? Do you
instead allow a WSGI application to return an empty error page and
have a higher up WSGI middleware component catch that and substitute
its own based on the error type, ie., in similar style to Apache and
ErrorDocument directive? Or do you do something else?

The problem then as above is what does one do at the boundary between
a WSGI application and the web server hosting it? Do you just always
assume a WSGI application provides an error page, or allow some way
that a WSGI application can defer to the web server the task of
generating an error page instead?

Graham