From alice at gothcandy.com  Mon Dec 13 06:59:32 2010
From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=)
Date: Sun, 12 Dec 2010 21:59:32 -0800
Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment
	middleware.
Message-ID: <ie4co4$ok4$1@dough.gmane.org>

Howdy!

There's one issue I've seen repeated a lot in working with WSGI1 and 
that is the use of middleware to process incoming data, but not 
outgoing, and vice-versa; middleware which filters the output in some 
way, but cares not about the input.

Wrapping middleware around an application is simple and effective, but 
costly in terms of stack allocation overhead; it also makes debugging a 
bit more of a nightmare as the stack trace can be quite deep.

My updated draft PEP 444[1] includes a section describing Filters, both 
ingress (input filtering) and egress (output filtering).  The API is 
trivially simple, optional (as filters can be easily adapted as 
middleware if the host server doesn't support filters) and easy to 
implement in a server.  (The Marrow HTTP/1.1 server implements them as 
two for loops.)

Basically an input filter accepts the environment dictionary and can 
mutate it.  Ingress filters take a single positional argument that is 
the environ.  The return value is ignored.  (This is questionable; it 
may sometimes be good to have ingress filters return responses.  Not 
sure about that, though.)

An egress filter accepts the status, headers, body tuple from the 
applciation and returns a status, headers, and body tuple of its own 
which then replaces the response.  An example implementation is:

	for filter_ in ingress_filters:
	    filter_(environ)
	
	response = application(environ)
	
	for filter_ in egress_filters:
	    response = filter_(*response)

I'd love to get some input on this.  Questions, comments, criticisms, 
or better ideas are welcome!

	? Alice.

[1] https://github.com/GothAlice/wsgi2/blob/master/pep-0444.rst


From fumanchu at aminus.org  Mon Dec 13 17:20:24 2010
From: fumanchu at aminus.org (Robert Brewer)
Date: Mon, 13 Dec 2010 08:20:24 -0800
Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to
	supplimentmiddleware.
In-Reply-To: <ie4co4$ok4$1@dough.gmane.org>
References: <ie4co4$ok4$1@dough.gmane.org>
Message-ID: <F1962646D3B64642B7C9A06068EE1E640FC1D322@ex10.hostedexchange.local>

Alice Bevan?McGregor
> There's one issue I've seen repeated a lot in working with WSGI1 and
> that is the use of middleware to process incoming data, but not
> outgoing, and vice-versa; middleware which filters the output in some
> way, but cares not about the input.
> 
> Wrapping middleware around an application is simple and effective, but
> costly in terms of stack allocation overhead; it also makes debugging a
> bit more of a nightmare as the stack trace can be quite deep.
> 
> My updated draft PEP 444[1] includes a section describing Filters, both
> ingress (input filtering) and egress (output filtering).  The API is
> trivially simple, optional (as filters can be easily adapted as
> middleware if the host server doesn't support filters) and easy to
> implement in a server.  (The Marrow HTTP/1.1 server implements them as
> two for loops.)
> 
> Basically an input filter accepts the environment dictionary and can
> mutate it.  Ingress filters take a single positional argument that is
> the environ.  The return value is ignored.  (This is questionable; it
> may sometimes be good to have ingress filters return responses.  Not
> sure about that, though.)
> 
> An egress filter accepts the status, headers, body tuple from the
> applciation and returns a status, headers, and body tuple of its own
> which then replaces the response.  An example implementation is:
> 
> 	for filter_ in ingress_filters:
> 	    filter_(environ)
> 
> 	response = application(environ)
> 
> 	for filter_ in egress_filters:
> 	    response = filter_(*response)

That looks amazingly like the code for CherryPy Filters circa 2005. In version 2 of CherryPy, "Filters" were the canonical extension method (for the framework, not WSGI, but the same lessons apply). It was still expensive in terms of stack allocation overhead, because you had to call () each filter to see if it was "on". It would be much better to find a way to write something like:

    for f in ingress_filters:
        if f.on:
            f(environ)

It was also fiendishly difficult to get executed in the right order: if you had a filter that was both ingress and egress, the natural tendency for core developers and users alike was to append each to each list, but this is almost never the correct order. But even if you solve the issue of static composition, there's still a demand for programmatic composition ("if X then add Y after it"), and even decomposition ("find the caching filter my framework added automatically and turn it off"), and list.insert()/remove() isn't stellar at that. Calling the filter to ask it whether it is "on" also leads filter developers down the wrong path; you really don't want to have Filter A trying to figure out if some other, conflicting Filter B has already run (or will run soon) that demands Filter A return without executing anything. You really, really want the set of filters to be both statically defined and statically analyzable.

Finally, you want the execution of filters to be configurable per URI and also configurable per controller. So the above should be rewritten again to something like:

    for f in ingress_filters(controller):
        if f.on(environ['path_info']):
            f(environ)

It was for these reasons that CherryPy 3 ditched its version 2 "filters" and replaced them with "hooks and tools" in version 3. You might find more insight by studying the latest cherrypy/_cptools.py


Robert Brewer
fumanchu at aminus.org

From alice at gothcandy.com  Mon Dec 13 20:42:02 2010
From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=)
Date: Mon, 13 Dec 2010 11:42:02 -0800
Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to
	supplimentmiddleware.
References: <ie4co4$ok4$1@dough.gmane.org>
	<F1962646D3B64642B7C9A06068EE1E640FC1D322@ex10.hostedexchange.local>
Message-ID: <ie5sua$c5j$1@dough.gmane.org>

> That looks amazingly like the code for CherryPy Filters circa 2005. In 
> version 2 of CherryPy, "Filters" were the canonical extension method 
> (for the framework, not WSGI, but the same lessons apply). It was still 
> expensive in terms of stack allocation overhead, because you had to 
> call () each filter to see if it was "on". It would be much better to 
> find a way to write something like:
> 
> 
> 
>     for f in ingress_filters:
> 
>         if f.on:
> 
>             f(environ)

.on will need to be an @property in most cases, still not avoiding 
stack allocation and, in fact, doubling the overhead per filter.  
Statically disabled filters should not be added to the filter list.

> It was also fiendishly difficult to get executed in the right order: if 
> you had a filter that was both ingress and egress, the natural tendency 
> for core developers and users alike was to append each to each list, 
> but this is almost never the correct order.

If something is both an ingress and egress filter, it should be 
implemented as middleware instead.  Nothing can prevent developers from 
doing bad things if they really try.  Appending to ingress and 
prepending to egress would be the "right" thing to simulate middleware 
behaviour with filters, but again, don't do that.  ;)

> But even if you solve the issue of static composition, there's still a 
> demand for programmatic composition ("if X then add Y after it"), and 
> even decomposition ("find the caching filter my framework added 
> automatically and turn it off"), and list.insert()/remove() isn't 
> stellar at that.

I have plans (and partial implementation) of a init.d-style 
"needs/uses/provides" declaration and automatic dependency graphing.  
WebCore, for example, adds the declarations to existing middleware 
layers to sort the middleware.

> Calling the filter to ask it whether it is "on" also leads filter 
> developers down the wrong path; you really don't want to have Filter A 
> trying to figure out if some other, conflicting Filter B has already 
> run (or will run soon) that demands Filter A return without executing 
> anything. You really, really want the set of filters to be both 
> statically defined and statically analyzable.

Unfortunately, most, if not all filters need to check for request 
headers and response headers to determine the capability to run.  E.g. 
compression checks environ.get('HTTP_ACCEPT_ENCODING', '').lower() for 
'gzip', and checks the response to determine if a 'Content-Encoding' 
header has already been specified.

> Finally, you want the execution of filters to be configurable per URI 
> and also configurable per controller. So the above should be rewritten 
> again to something like:
> 
> 
> 
>     for f in ingress_filters(controller):
> 
>         if f.on(environ['path_info']):
> 
>             f(environ)
> 
> 
> 
> It was for these reasons that CherryPy 3 ditched its version 2 
> "filters" and replaced them with "hooks and tools" in version 3.

This is possible by wrapping multiple applications, say, in the filter 
middleware adapter with differing filter setups, then using the 
separate wrapped applications with some form of dispatch.  You could 
also utilize filters as decorators.  This is an implementation detail 
left up to the framework utilizing WSGI2, however.  WSGI2 itself has no 
concept of "controllers".

None of this prevents the simplified stack from being useful during 
exception handling, though.  ;)  What I was really trying to do is 
reduce the level of nesting on each request and make what used to be 
middleware more explicit in its purpose.

> You might find more insight by studying the latest cherrypy/_cptools.py

I'll give it a gander, though I firmly believe filter management (as 
middleware stack management) is the domain of a framework on top of 
WSGI2, not the domain of the protocol.

	? Alice.


From ianb at colorstudy.com  Tue Dec 14 21:02:33 2010
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 14 Dec 2010 12:02:33 -0800
Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment
	middleware.
In-Reply-To: <ie4co4$ok4$1@dough.gmane.org>
References: <ie4co4$ok4$1@dough.gmane.org>
Message-ID: <AANLkTikwooNrRmVhvLV1OBrQshRnL7iinesJAj6Amkgs@mail.gmail.com>

On Sun, Dec 12, 2010 at 9:59 PM, Alice Bevan?McGregor
<alice at gothcandy.com>wrote:

> Howdy!
>
> There's one issue I've seen repeated a lot in working with WSGI1 and that
> is the use of middleware to process incoming data, but not outgoing, and
> vice-versa; middleware which filters the output in some way, but cares not
> about the input.
>
> Wrapping middleware around an application is simple and effective, but
> costly in terms of stack allocation overhead; it also makes debugging a bit
> more of a nightmare as the stack trace can be quite deep.
>
> My updated draft PEP 444[1] includes a section describing Filters, both
> ingress (input filtering) and egress (output filtering).  The API is
> trivially simple, optional (as filters can be easily adapted as middleware
> if the host server doesn't support filters) and easy to implement in a
> server.  (The Marrow HTTP/1.1 server implements them as two for loops.)
>

It's not clear to me how this can be composed or abstracted.

@webob.dec.wsgify does kind of handle this with its request/response
pattern; in a simplified form it's like:

def wsgify(func):
    def replacement(environ):
        req = Request(environ)
        resp = func(req)
        return resp(environ)
    return replacement

This allows you to do an output filter like:

@wsgify
def output_filter(req):
    resp = some_app(req.environ)
    fiddle_with_resp(resp)
    return resp

(Most output filters also need the request.)  And an input filter like:

@wsgify
def input_filter(req):
    fiddle_with_req(req)
    return some_app


But while it handles the input filter case, it doesn't try to generalize
this or move application composition into the server.  An application is an
application and servers are imagined but not actually concrete.  If you
handle filters at the server level you have to have some way of registering
these filters, and it's unclear what order they should be applied.  At
import?  Does the server have to poke around in the app it is running?  How
can it traverse down if you have dispatching apps (like paste.urlmap or
Routes)?

You can still implement this locally of course, as a class that takes an app
and input and output filters.


-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20101214/ffe3e838/attachment.html>

From alice at gothcandy.com  Tue Dec 14 21:54:12 2010
From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=)
Date: Tue, 14 Dec 2010 12:54:12 -0800
Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment
	middleware.
References: <ie4co4$ok4$1@dough.gmane.org>
	<AANLkTikwooNrRmVhvLV1OBrQshRnL7iinesJAj6Amkgs@mail.gmail.com>
Message-ID: <ie8lhk$dg0$1@dough.gmane.org>

Ian,

> It's not clear to me how this can be composed or abstracted.

Filters themselves have no knowledge of the applicaiton, in a similar 
vein to middleware not knowing if the next layer in the onion skin is 
the final application, or another bit of well-behaved middleware, 
except that filters do not get a reference to an "inner" application at 
all.  (They are linear, not nested.)

> (Most output filters also need the request.)

You are quite correct; I'll upate the PEP.  marrow.server.http already 
passes environ to egress filters in addition to the status_bytes, 
headers_list, body_iter data.

> But while it handles the input filter case, it doesn't try to 
> generalize this or move application composition into the server.

A large proportion of the filters I was able to imagine are 
conditionless: there would be no "path" within your application 
(controller or otherwise) that would need to modify the majority of 
them.  As an example, egress compression.  (And even then, my example 
egress compression filter offers a documented mechanism to disable it 
on a per-request basis.)

> An application is an application and servers are imagined but not 
> actually concrete.

Could you elaborate?  (Define "concrete" in this context.)

> If you handle filters at the server level you have to have some way of 
> registering these filters, and it's unclear what order they should be 
> applied.? At import?? Does the server have to poke around in the app it 
> is running?? How can it traverse down if you have dispatching apps 
> (like paste.urlmap or Routes)?

Filters are unaffected by, and unaware of, dispatch.  They are defined 
at the same time your application middleware stack is constructed, and 
passed (in the current implementation) to the HTTPServer protocol as a 
list at the same time as your wrapped application stack.

> You can still implement this locally of course, as a class that takes 
> an app and input and output filters.

If you -do- need "region specific" filtering, you can ostensibly wrap 
multiple final applications in filter management middleware, as you 
say.  That's a fairly advanced use-case regardless of filtering.

I would love to see examples of what people might implement as filters 
(i.e. middleware that does ONE of ingress or egress processing, not 
both).  From CherryPy I see things like:

 * BaseURLFilter (ingress Apache base path adjustments)
 * DecodingFilter (ingress request parameter decoding)
 * EncodingFilter (egress response header and body encoding)
 * GzipFilter (already mentioned)
 * LogDebugInfoFilter (egress insertion of page generation time into 
HTML stream)
 * TidyFilter (egress piping of response body to Tidy)
 * VirtualHostFilter (similar to BaseURLFilter)

None of these (with the possible exception of LogDebugInfoFilter) I 
could imagine needing to be path-specific.

	? Alice.


From ianb at colorstudy.com  Tue Dec 14 22:25:58 2010
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 14 Dec 2010 13:25:58 -0800
Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment
	middleware.
In-Reply-To: <ie8lhk$dg0$1@dough.gmane.org>
References: <ie4co4$ok4$1@dough.gmane.org>
	<AANLkTikwooNrRmVhvLV1OBrQshRnL7iinesJAj6Amkgs@mail.gmail.com>
	<ie8lhk$dg0$1@dough.gmane.org>
Message-ID: <AANLkTincp1m+5e-jzJm7zGOsf0cCggVQ4RxnzgMAk5Bj@mail.gmail.com>

On Tue, Dec 14, 2010 at 12:54 PM, Alice Bevan?McGregor
<alice at gothcandy.com>wrote:

>
>  An application is an application and servers are imagined but not actually
>> concrete.
>>
>
> Could you elaborate?  (Define "concrete" in this context.)


WSGI applications never directly touch the server.  They are called by the
server, but have no reference to the server.  Servers in turn take an app
and parameters specific to there serveryness (which may or may not even
involve HTTP), but it's good we've gotten them out of the realm of
application composition (early on WSGI servers frequently handled mounting
apps at locations in the path, but that's been replaced with dispatching
middleware).  An application wrapped with middleware is also a single object
you can hand around; we don't have an object that represents all of
"application, list of pre-filters, list of post-filters".


>
>  If you handle filters at the server level you have to have some way of
>> registering these filters, and it's unclear what order they should be
>> applied.  At import?  Does the server have to poke around in the app it is
>> running?  How can it traverse down if you have dispatching apps (like
>> paste.urlmap or Routes)?
>>
>
> Filters are unaffected by, and unaware of, dispatch.  They are defined at
> the same time your application middleware stack is constructed, and passed
> (in the current implementation) to the HTTPServer protocol as a list at the
> same time as your wrapped application stack.
>
>
>  You can still implement this locally of course, as a class that takes an
>> app and input and output filters.
>>
>
> If you -do- need "region specific" filtering, you can ostensibly wrap
> multiple final applications in filter management middleware, as you say.
>  That's a fairly advanced use-case regardless of filtering.
>
> I would love to see examples of what people might implement as filters
> (i.e. middleware that does ONE of ingress or egress processing, not both).
>  From CherryPy I see things like:
>
> * BaseURLFilter (ingress Apache base path adjustments)
> * DecodingFilter (ingress request parameter decoding)
> * EncodingFilter (egress response header and body encoding)
> * GzipFilter (already mentioned)
> * LogDebugInfoFilter (egress insertion of page generation time into HTML
> stream)
> * TidyFilter (egress piping of response body to Tidy)
> * VirtualHostFilter (similar to BaseURLFilter)
>
> None of these (with the possible exception of LogDebugInfoFilter) I could
> imagine needing to be path-specific.


GzipFilter is wonky at best (it interacts oddly with range requests and
etags).  Prefix handling is useful (e.g.,
paste.deploy.config.PrefixMiddleware), and usually global and unconfigured.
Debugging and logging stuff often needs per-path configuration, which can
mean multiple instances applied after dispatch.  Encoding and Decoding don't
apply to WSGI.  Tidy is intrusive and I think questionable on a global
level.  I don't think the use cases are there.  Tightly bound pre-filters
and post-filters are particularly problematic.  This all seems like a lot of
work to avoid a few stack frames in a traceback.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20101214/7cf30721/attachment-0001.html>

From alice at gothcandy.com  Fri Dec 24 10:07:30 2010
From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=)
Date: Fri, 24 Dec 2010 01:07:30 -0800
Subject: [Web-SIG] PEP 444 Draft Rewrite
Message-ID: <if1nsi$kgd$1@dough.gmane.org>

Howdy!

I've mostly finished a draft rewrite of PEP 444 (WSGI 2), incorporating 
some additional ideas covering things like py2k/py3k interoperability 
and switching from a more narrative style to a substantially 
RFC-inspired language.

	http://bit.ly/e7rtI6

I'm using Textile as my intermediary format, and will obviously need to 
convert this to ReStructuredText when I'm done.  Missing are:

 * The majority of the examples.
 * Narrative rationale, wich I'll be writing shortly.
 * Narrative Python compatibility documentation.
 * Asynchronous documentation.  This will likely rely on the abstract 
API defined in PEP 3148 (futures) as implemented in Python 3.2 and the 
"futures" package available on PyPi.
 * Additional and complete references.  The Rationale chapter will add 
many references to community discussion.

I would appreciate it greatly if this rewrite could be read through and 
questions, corrections, or even references to possible ambiguity 
mentioned in discussion.

Have a happy holidays and a merry new-year, everybody!  :)

	- Alice.

P.s. I'll be updating my PEP 444 reference implementation HTTP 1.1 
server (marrow.server.http) over the holidays to incorporate the 
changes in this rewrite; most notably the separation of byte strings, 
unicode strings, and native strings.