From alice at gothcandy.com Mon Dec 13 06:59:32 2010 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Sun, 12 Dec 2010 21:59:32 -0800 Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware. Message-ID: Howdy! There's one issue I've seen repeated a lot in working with WSGI1 and that is the use of middleware to process incoming data, but not outgoing, and vice-versa; middleware which filters the output in some way, but cares not about the input. Wrapping middleware around an application is simple and effective, but costly in terms of stack allocation overhead; it also makes debugging a bit more of a nightmare as the stack trace can be quite deep. My updated draft PEP 444[1] includes a section describing Filters, both ingress (input filtering) and egress (output filtering). The API is trivially simple, optional (as filters can be easily adapted as middleware if the host server doesn't support filters) and easy to implement in a server. (The Marrow HTTP/1.1 server implements them as two for loops.) Basically an input filter accepts the environment dictionary and can mutate it. Ingress filters take a single positional argument that is the environ. The return value is ignored. (This is questionable; it may sometimes be good to have ingress filters return responses. Not sure about that, though.) An egress filter accepts the status, headers, body tuple from the applciation and returns a status, headers, and body tuple of its own which then replaces the response. An example implementation is: for filter_ in ingress_filters: filter_(environ) response = application(environ) for filter_ in egress_filters: response = filter_(*response) I'd love to get some input on this. Questions, comments, criticisms, or better ideas are welcome! ? Alice. [1] https://github.com/GothAlice/wsgi2/blob/master/pep-0444.rst From fumanchu at aminus.org Mon Dec 13 17:20:24 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 13 Dec 2010 08:20:24 -0800 Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware. In-Reply-To: References: Message-ID: Alice Bevan?McGregor > There's one issue I've seen repeated a lot in working with WSGI1 and > that is the use of middleware to process incoming data, but not > outgoing, and vice-versa; middleware which filters the output in some > way, but cares not about the input. > > Wrapping middleware around an application is simple and effective, but > costly in terms of stack allocation overhead; it also makes debugging a > bit more of a nightmare as the stack trace can be quite deep. > > My updated draft PEP 444[1] includes a section describing Filters, both > ingress (input filtering) and egress (output filtering). The API is > trivially simple, optional (as filters can be easily adapted as > middleware if the host server doesn't support filters) and easy to > implement in a server. (The Marrow HTTP/1.1 server implements them as > two for loops.) > > Basically an input filter accepts the environment dictionary and can > mutate it. Ingress filters take a single positional argument that is > the environ. The return value is ignored. (This is questionable; it > may sometimes be good to have ingress filters return responses. Not > sure about that, though.) > > An egress filter accepts the status, headers, body tuple from the > applciation and returns a status, headers, and body tuple of its own > which then replaces the response. An example implementation is: > > for filter_ in ingress_filters: > filter_(environ) > > response = application(environ) > > for filter_ in egress_filters: > response = filter_(*response) That looks amazingly like the code for CherryPy Filters circa 2005. In version 2 of CherryPy, "Filters" were the canonical extension method (for the framework, not WSGI, but the same lessons apply). It was still expensive in terms of stack allocation overhead, because you had to call () each filter to see if it was "on". It would be much better to find a way to write something like: for f in ingress_filters: if f.on: f(environ) It was also fiendishly difficult to get executed in the right order: if you had a filter that was both ingress and egress, the natural tendency for core developers and users alike was to append each to each list, but this is almost never the correct order. But even if you solve the issue of static composition, there's still a demand for programmatic composition ("if X then add Y after it"), and even decomposition ("find the caching filter my framework added automatically and turn it off"), and list.insert()/remove() isn't stellar at that. Calling the filter to ask it whether it is "on" also leads filter developers down the wrong path; you really don't want to have Filter A trying to figure out if some other, conflicting Filter B has already run (or will run soon) that demands Filter A return without executing anything. You really, really want the set of filters to be both statically defined and statically analyzable. Finally, you want the execution of filters to be configurable per URI and also configurable per controller. So the above should be rewritten again to something like: for f in ingress_filters(controller): if f.on(environ['path_info']): f(environ) It was for these reasons that CherryPy 3 ditched its version 2 "filters" and replaced them with "hooks and tools" in version 3. You might find more insight by studying the latest cherrypy/_cptools.py Robert Brewer fumanchu at aminus.org From alice at gothcandy.com Mon Dec 13 20:42:02 2010 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Mon, 13 Dec 2010 11:42:02 -0800 Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware. References: Message-ID: > That looks amazingly like the code for CherryPy Filters circa 2005. In > version 2 of CherryPy, "Filters" were the canonical extension method > (for the framework, not WSGI, but the same lessons apply). It was still > expensive in terms of stack allocation overhead, because you had to > call () each filter to see if it was "on". It would be much better to > find a way to write something like: > > > > for f in ingress_filters: > > if f.on: > > f(environ) .on will need to be an @property in most cases, still not avoiding stack allocation and, in fact, doubling the overhead per filter. Statically disabled filters should not be added to the filter list. > It was also fiendishly difficult to get executed in the right order: if > you had a filter that was both ingress and egress, the natural tendency > for core developers and users alike was to append each to each list, > but this is almost never the correct order. If something is both an ingress and egress filter, it should be implemented as middleware instead. Nothing can prevent developers from doing bad things if they really try. Appending to ingress and prepending to egress would be the "right" thing to simulate middleware behaviour with filters, but again, don't do that. ;) > But even if you solve the issue of static composition, there's still a > demand for programmatic composition ("if X then add Y after it"), and > even decomposition ("find the caching filter my framework added > automatically and turn it off"), and list.insert()/remove() isn't > stellar at that. I have plans (and partial implementation) of a init.d-style "needs/uses/provides" declaration and automatic dependency graphing. WebCore, for example, adds the declarations to existing middleware layers to sort the middleware. > Calling the filter to ask it whether it is "on" also leads filter > developers down the wrong path; you really don't want to have Filter A > trying to figure out if some other, conflicting Filter B has already > run (or will run soon) that demands Filter A return without executing > anything. You really, really want the set of filters to be both > statically defined and statically analyzable. Unfortunately, most, if not all filters need to check for request headers and response headers to determine the capability to run. E.g. compression checks environ.get('HTTP_ACCEPT_ENCODING', '').lower() for 'gzip', and checks the response to determine if a 'Content-Encoding' header has already been specified. > Finally, you want the execution of filters to be configurable per URI > and also configurable per controller. So the above should be rewritten > again to something like: > > > > for f in ingress_filters(controller): > > if f.on(environ['path_info']): > > f(environ) > > > > It was for these reasons that CherryPy 3 ditched its version 2 > "filters" and replaced them with "hooks and tools" in version 3. This is possible by wrapping multiple applications, say, in the filter middleware adapter with differing filter setups, then using the separate wrapped applications with some form of dispatch. You could also utilize filters as decorators. This is an implementation detail left up to the framework utilizing WSGI2, however. WSGI2 itself has no concept of "controllers". None of this prevents the simplified stack from being useful during exception handling, though. ;) What I was really trying to do is reduce the level of nesting on each request and make what used to be middleware more explicit in its purpose. > You might find more insight by studying the latest cherrypy/_cptools.py I'll give it a gander, though I firmly believe filter management (as middleware stack management) is the domain of a framework on top of WSGI2, not the domain of the protocol. ? Alice. From ianb at colorstudy.com Tue Dec 14 21:02:33 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 14 Dec 2010 12:02:33 -0800 Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware. In-Reply-To: References: Message-ID: On Sun, Dec 12, 2010 at 9:59 PM, Alice Bevan?McGregor wrote: > Howdy! > > There's one issue I've seen repeated a lot in working with WSGI1 and that > is the use of middleware to process incoming data, but not outgoing, and > vice-versa; middleware which filters the output in some way, but cares not > about the input. > > Wrapping middleware around an application is simple and effective, but > costly in terms of stack allocation overhead; it also makes debugging a bit > more of a nightmare as the stack trace can be quite deep. > > My updated draft PEP 444[1] includes a section describing Filters, both > ingress (input filtering) and egress (output filtering). The API is > trivially simple, optional (as filters can be easily adapted as middleware > if the host server doesn't support filters) and easy to implement in a > server. (The Marrow HTTP/1.1 server implements them as two for loops.) > It's not clear to me how this can be composed or abstracted. @webob.dec.wsgify does kind of handle this with its request/response pattern; in a simplified form it's like: def wsgify(func): def replacement(environ): req = Request(environ) resp = func(req) return resp(environ) return replacement This allows you to do an output filter like: @wsgify def output_filter(req): resp = some_app(req.environ) fiddle_with_resp(resp) return resp (Most output filters also need the request.) And an input filter like: @wsgify def input_filter(req): fiddle_with_req(req) return some_app But while it handles the input filter case, it doesn't try to generalize this or move application composition into the server. An application is an application and servers are imagined but not actually concrete. If you handle filters at the server level you have to have some way of registering these filters, and it's unclear what order they should be applied. At import? Does the server have to poke around in the app it is running? How can it traverse down if you have dispatching apps (like paste.urlmap or Routes)? You can still implement this locally of course, as a class that takes an app and input and output filters. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alice at gothcandy.com Tue Dec 14 21:54:12 2010 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Tue, 14 Dec 2010 12:54:12 -0800 Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware. References: Message-ID: Ian, > It's not clear to me how this can be composed or abstracted. Filters themselves have no knowledge of the applicaiton, in a similar vein to middleware not knowing if the next layer in the onion skin is the final application, or another bit of well-behaved middleware, except that filters do not get a reference to an "inner" application at all. (They are linear, not nested.) > (Most output filters also need the request.) You are quite correct; I'll upate the PEP. marrow.server.http already passes environ to egress filters in addition to the status_bytes, headers_list, body_iter data. > But while it handles the input filter case, it doesn't try to > generalize this or move application composition into the server. A large proportion of the filters I was able to imagine are conditionless: there would be no "path" within your application (controller or otherwise) that would need to modify the majority of them. As an example, egress compression. (And even then, my example egress compression filter offers a documented mechanism to disable it on a per-request basis.) > An application is an application and servers are imagined but not > actually concrete. Could you elaborate? (Define "concrete" in this context.) > If you handle filters at the server level you have to have some way of > registering these filters, and it's unclear what order they should be > applied.? At import?? Does the server have to poke around in the app it > is running?? How can it traverse down if you have dispatching apps > (like paste.urlmap or Routes)? Filters are unaffected by, and unaware of, dispatch. They are defined at the same time your application middleware stack is constructed, and passed (in the current implementation) to the HTTPServer protocol as a list at the same time as your wrapped application stack. > You can still implement this locally of course, as a class that takes > an app and input and output filters. If you -do- need "region specific" filtering, you can ostensibly wrap multiple final applications in filter management middleware, as you say. That's a fairly advanced use-case regardless of filtering. I would love to see examples of what people might implement as filters (i.e. middleware that does ONE of ingress or egress processing, not both). From CherryPy I see things like: * BaseURLFilter (ingress Apache base path adjustments) * DecodingFilter (ingress request parameter decoding) * EncodingFilter (egress response header and body encoding) * GzipFilter (already mentioned) * LogDebugInfoFilter (egress insertion of page generation time into HTML stream) * TidyFilter (egress piping of response body to Tidy) * VirtualHostFilter (similar to BaseURLFilter) None of these (with the possible exception of LogDebugInfoFilter) I could imagine needing to be path-specific. ? Alice. From ianb at colorstudy.com Tue Dec 14 22:25:58 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 14 Dec 2010 13:25:58 -0800 Subject: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware. In-Reply-To: References: Message-ID: On Tue, Dec 14, 2010 at 12:54 PM, Alice Bevan?McGregor wrote: > > An application is an application and servers are imagined but not actually >> concrete. >> > > Could you elaborate? (Define "concrete" in this context.) WSGI applications never directly touch the server. They are called by the server, but have no reference to the server. Servers in turn take an app and parameters specific to there serveryness (which may or may not even involve HTTP), but it's good we've gotten them out of the realm of application composition (early on WSGI servers frequently handled mounting apps at locations in the path, but that's been replaced with dispatching middleware). An application wrapped with middleware is also a single object you can hand around; we don't have an object that represents all of "application, list of pre-filters, list of post-filters". > > If you handle filters at the server level you have to have some way of >> registering these filters, and it's unclear what order they should be >> applied. At import? Does the server have to poke around in the app it is >> running? How can it traverse down if you have dispatching apps (like >> paste.urlmap or Routes)? >> > > Filters are unaffected by, and unaware of, dispatch. They are defined at > the same time your application middleware stack is constructed, and passed > (in the current implementation) to the HTTPServer protocol as a list at the > same time as your wrapped application stack. > > > You can still implement this locally of course, as a class that takes an >> app and input and output filters. >> > > If you -do- need "region specific" filtering, you can ostensibly wrap > multiple final applications in filter management middleware, as you say. > That's a fairly advanced use-case regardless of filtering. > > I would love to see examples of what people might implement as filters > (i.e. middleware that does ONE of ingress or egress processing, not both). > From CherryPy I see things like: > > * BaseURLFilter (ingress Apache base path adjustments) > * DecodingFilter (ingress request parameter decoding) > * EncodingFilter (egress response header and body encoding) > * GzipFilter (already mentioned) > * LogDebugInfoFilter (egress insertion of page generation time into HTML > stream) > * TidyFilter (egress piping of response body to Tidy) > * VirtualHostFilter (similar to BaseURLFilter) > > None of these (with the possible exception of LogDebugInfoFilter) I could > imagine needing to be path-specific. GzipFilter is wonky at best (it interacts oddly with range requests and etags). Prefix handling is useful (e.g., paste.deploy.config.PrefixMiddleware), and usually global and unconfigured. Debugging and logging stuff often needs per-path configuration, which can mean multiple instances applied after dispatch. Encoding and Decoding don't apply to WSGI. Tidy is intrusive and I think questionable on a global level. I don't think the use cases are there. Tightly bound pre-filters and post-filters are particularly problematic. This all seems like a lot of work to avoid a few stack frames in a traceback. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alice at gothcandy.com Fri Dec 24 10:07:30 2010 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Fri, 24 Dec 2010 01:07:30 -0800 Subject: [Web-SIG] PEP 444 Draft Rewrite Message-ID: Howdy! I've mostly finished a draft rewrite of PEP 444 (WSGI 2), incorporating some additional ideas covering things like py2k/py3k interoperability and switching from a more narrative style to a substantially RFC-inspired language. http://bit.ly/e7rtI6 I'm using Textile as my intermediary format, and will obviously need to convert this to ReStructuredText when I'm done. Missing are: * The majority of the examples. * Narrative rationale, wich I'll be writing shortly. * Narrative Python compatibility documentation. * Asynchronous documentation. This will likely rely on the abstract API defined in PEP 3148 (futures) as implemented in Python 3.2 and the "futures" package available on PyPi. * Additional and complete references. The Rationale chapter will add many references to community discussion. I would appreciate it greatly if this rewrite could be read through and questions, corrections, or even references to possible ambiguity mentioned in discussion. Have a happy holidays and a merry new-year, everybody! :) - Alice. P.s. I'll be updating my PEP 444 reference implementation HTTP 1.1 server (marrow.server.http) over the holidays to incorporate the changes in this rewrite; most notably the separation of byte strings, unicode strings, and native strings.