[Web-SIG] Standardized configuration

Ian Bicking ianb at colorstudy.com
Tue Jul 19 04:57:40 CEST 2005


Graham Dumpleton wrote:
> My understanding from reading the WSGI PEP and examples like that above is
> that the WSGI middleware stack concept is very much tree like, but where at
> any specific node within the tree, one can only traverse into one child. 
> Ie.,
> a parent middleware component could make a decision to defer to one 
> child or
> another, but there is no means of really trying out multiple choices until
> you find one that is prepared to handle the request. The only way around it
> seems to be make the linear chain of nested applications longer and longer,
> something which to me just doesn't sit right. In some respects the need for
> the configuration scheme is in part to make that less unwieldy.

It's not at all limited to this, but these are simply the ones that are 
easy to configure, and can be inserted into a stack without changing the 
stack very much.

> What I am doing is making it acceptable for a handler to also return None.
> If this were returned by the highest level handler, it would equate to 
> being
> the same as DECLINED, but within the context of middleware components it
> has a lightly relaxed meaning. Specifically, it indicates that that handler
> isn't returning a response, but not that it is indicating that the request
> as a whole is being DECLINED causing a return to Apache.

Incidentally, I'd typically use an exception when the return value 
didn't include the semantics I wanted, but that might not be a problem here.

> One last example, is what a session based login mechanism might look like
> since this was one of the examples posed in the initial discussion. Here 
> you
> might have a handler for a whole directory which contains:
> 
> _userDatabase = _users.UserDatabase()
> 
> handler = Handlers(
>     IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
>     IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),
> 
>     IfLocationIsADirectory(ExternalRedirect('index.html')),
> 
>     # Create session and stick it in request object.
>     CreateUserSession(),
> 
>     # Login form shouldn't require user to be logged in to access it.
>     IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()),
> 
>     # Serve requests against login/logout URLs and otherwise
>     # don't let request proceed if user not yet authenticated.
>     # Will redirect to login form if not authenticated.
>     FormAuthentication(_userDatabase,"login.html"),
> 
>     SetResponseHeader('Pragma','no-cache'),
>     SetResponseHeader('Cache-Control','no-cache'),
>     SetResponseHeader('Expires','-1'),
> 
>     IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
> )
> 
> Again, one has done away with the need for a configuration files as the 
> code
> itself specifies what is required, along with the constraints as to what
> order things should be done in.
> 
> Another thing this example shows is that handlers when they return None due
> to not returning an actual response, can still add to the response headers
> in the way of special cookies as required by sessions, or headers 
> controlling
> caching etc.

This is not possible in WSGI middleware if handled in a chain-like 
fashion.  Nested middleware can do this, of course.

This kind of chaining would be necessary if "services" were used, as 
many services have to effect the response, and there's no WSGI-related 
spec about where or how they would do that.  Though I haven't digested 
all the long emails lately...

> In terms of late binding of which handler is executed, the "PythonModule"
> handler is one example in that it selects which Python module to load only
> when the request is being handled. Another example of late construction of
> an instance of a handler in what I am doing, albeit the same type, is:
> 
>   class Handler:
> 
>     def __init__(self,req):
>       self.__req = req
> 
>     def __call__(self,name="value"):
>       self.__req.content_type = "text/html"
>       self.__req.send_http_header()
>       self.__req.write("<html><body>")
>       self.__req.write("<p>name=%r</p>"%cgi.escape(name))
>       self.__req.write("</body></html>")
>       return apache.OK
> 
>   handler = IfExtensionEquals("html",HandlerInstance(Handler))
> 
> First off the "HandlerInstance" object is only triggered if the request
> against this specific file based resource was by way of a ".html"
> extension. When it is triggered, it is only at that point that an instance
> of "Handler" is created, with the request object being supplied to the
> constructor.

Incidentally, I'm doing something a little like that with the 
filebrowser example in Paste:

http://svn.pythonpaste.org/Paste/trunk/examples/filebrowser/web/__init__.py

Looking at it now, it's not clear where that's happening, but (in 
application()) context.path(path) creates a WSGI application using a 
class based on the extension/expected mime type.  So the dispatching is 
similar.

> To round this off, the special "Handlers" handler only contains the 
> following
> code. Pretty simple, but makes construction of the component hierarchy a 
> bit
> easier in my mind when multiple things need to be done in turn where 
> nesting
> isn't strictly required.
> 
>   class Handlers:
> 
>     def __init__(self,*handlers):
>         self.__handlers = handlers
> 
>     def __call__(self,req):
>         if len(self.__handlers) != 0:
>             for handler in self.__handlers:
>                 result = _execute(req,handler,lazy=True)
>                 if result is not None:
>                     return result
> 
> Would be very interested to see how people see this relating to what is 
> possible
> with WSGI. Could one instigate a similar sort of class to "Handlers" in 
> WSGI
> to sequence through WSGI applications until one generates a complete 
> response?
> 
> The areas that have me thinking the answer is "no" is that I recollect 
> the PEP
> saying that the "start_response" object can only be called once, which 
> precludes
> applications in a list adding to the response headers without returning 
> a valid
> status. Secondly, if "start_response" object hasn't been called when the 
> parent
> starts to try and construct the response content from the result of 
> calling the
> application, it raises an error. But then, I have a distinct lack of proper
> knowledge on WSGI so could be wrong.

When you just want to add headers (like with a session) you can use 
wrapping middleware, which appends to its application's response 
headers, but doesn't create a full response on its own.

As for the order, when there's an issue you can cache the call.  For 
instance, if I want to look at what gets passed to start_response before 
passing it up to the server, I create a fake start_response that just 
saves the values.  Or sometimes a start_response that merely watches the 
values, like when I want to check the content-type to see if I can 
insert information into the page (since you can't append text to an 
image, for instance).

> If my thinking is correct, it could only be done by changing the WSGI 
> specification
> to support the concept of trying applications in sequence, by way of 
> allowing None
> as the status when "start_response" is called to indicate the same as 
> when I return
> None from a handler. Ie., the application may have set headers, but 
> otherwise the
> parent should where possible move to a subsequence application and try 
> it etc.

There's several conventions that could be used for trying applications 
in-sequence.  For instance, you could do something like this (untested) 
for delegating to different apps until one of them doesn't respond with 
a 404:

class FirstFound(object):
     """Try apps in sequence until one doesn't return 404"""
     def __init__(self, apps):
         self.apps = apps
     def __call__(self, environ, start_response):
         def replacement_start_response(status, headers):
             if int(status.split()[0]) == 404:
                 raise HTTPNotFound
             return start_response(status, headers)
         for app in self.apps[:-1]:
             try:
                 return app(environ, replacement_start_response)
             except HTTPNotFound:
                 pass
         # If the last one responds with 404, so be it
         return self.apps[-1](environ, start_response)

> Anyway, people may feel that this is totally contrary to what WSGI is 
> all about and
> not relevant and that is fine, I am at least finding it an interesting 
> idea to
> play with in respect of mod_python at least.

It's very relevent, at least in my opinion.  This is exactly the sort of 
architecture I've been attracted to, and the kind of middleware I've 
been adding to Paste.  The biggest difference is that mod_python uses an 
actual list and return values, where WSGI uses nested function calls.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org


More information about the Web-SIG mailing list