[Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

Mon Dec 13 17:20:24 CET 2010

Alice Bevan–McGregor
> There's one issue I've seen repeated a lot in working with WSGI1 and
> that is the use of middleware to process incoming data, but not
> outgoing, and vice-versa; middleware which filters the output in some
> way, but cares not about the input.
> 
> Wrapping middleware around an application is simple and effective, but
> costly in terms of stack allocation overhead; it also makes debugging a
> bit more of a nightmare as the stack trace can be quite deep.
> 
> My updated draft PEP 444[1] includes a section describing Filters, both
> ingress (input filtering) and egress (output filtering).  The API is
> trivially simple, optional (as filters can be easily adapted as
> middleware if the host server doesn't support filters) and easy to
> implement in a server.  (The Marrow HTTP/1.1 server implements them as
> two for loops.)
> 
> Basically an input filter accepts the environment dictionary and can
> mutate it.  Ingress filters take a single positional argument that is
> the environ.  The return value is ignored.  (This is questionable; it
> may sometimes be good to have ingress filters return responses.  Not
> sure about that, though.)
> 
> An egress filter accepts the status, headers, body tuple from the
> applciation and returns a status, headers, and body tuple of its own
> which then replaces the response.  An example implementation is:
> 
> 	for filter_ in ingress_filters:
> 	    filter_(environ)
> 
> 	response = application(environ)
> 
> 	for filter_ in egress_filters:
> 	    response = filter_(*response)

That looks amazingly like the code for CherryPy Filters circa 2005. In version 2 of CherryPy, "Filters" were the canonical extension method (for the framework, not WSGI, but the same lessons apply). It was still expensive in terms of stack allocation overhead, because you had to call () each filter to see if it was "on". It would be much better to find a way to write something like:

    for f in ingress_filters:
        if f.on:
            f(environ)

It was also fiendishly difficult to get executed in the right order: if you had a filter that was both ingress and egress, the natural tendency for core developers and users alike was to append each to each list, but this is almost never the correct order. But even if you solve the issue of static composition, there's still a demand for programmatic composition ("if X then add Y after it"), and even decomposition ("find the caching filter my framework added automatically and turn it off"), and list.insert()/remove() isn't stellar at that. Calling the filter to ask it whether it is "on" also leads filter developers down the wrong path; you really don't want to have Filter A trying to figure out if some other, conflicting Filter B has already run (or will run soon) that demands Filter A return without executing anything. You really, really want the set of filters to be both statically defined and statically analyzable.

Finally, you want the execution of filters to be configurable per URI and also configurable per controller. So the above should be rewritten again to something like:

    for f in ingress_filters(controller):
        if f.on(environ['path_info']):
            f(environ)

It was for these reasons that CherryPy 3 ditched its version 2 "filters" and replaced them with "hooks and tools" in version 3. You might find more insight by studying the latest cherrypy/_cptools.py

Robert Brewer
fumanchu at aminus.org