[Web-SIG] Standardized configuration

Tue Jul 19 05:49:44 CEST 2005

Chris McDonough wrote:
> On Sun, 2005-07-17 at 03:16 -0500, Ian Bicking wrote:
> 
>>This is what Paste does in configuration, like:
>>
>>middleware.extend([
>>     SessionMiddleware, IdentificationMiddleware,
>>     AuthenticationMiddleware, ChallengeMiddleware])
>>
>>This kind of middleware takes a single argument, which is the 
>>application it will wrap.  In practice, this means all the other 
>>parameters go into lazily-read configuration.
> 
> 
> I'm finding it hard to imagine a reason to have another kind of
> middleware.
> 
> Well, actually that's not true.  In noodling about this, I did think it
> would be kind of neat in a twisted way to have "decision middleware"
> like:

In addition to the examples I gave in response to Graham, I wrote a 
document on this a while ago: 
http://pythonpaste.org/docs/url-parsing-with-wsgi.html

The hard part about this is configuration; it's easy to configure a 
non-branching chain of middleware.  Once it branches the configuration 
becomes hard (like programming-hard; which isn't *hard*, but it quickly 
stops feeling like configuration).

>>You can also define a "framework" (a plugin to Paste), which in addition 
>>to finding an "app" can also add middleware; basically embodying all the 
>>middleware that is typical for a framework.
> 
> 
> This appears to be what I'm trying to do too, which is why I'm intrigued
> by Paste.
> 
> OTOH, I'm not sure that I want my framework to "find" an app for me.
> I'd like to be able to define pipelines that include my app, but I'd
> typically just want to statically declare it as the end point of a
> pipeline composed of service middleware.  I should look at Paste a
> little more to see if it has the same philosophy or if I'm
> misunderstanding you.

Mostly I wanted to avoid lots of magical incantations for the simple 
case.  If you are used to Webware, well it has a very straight-forward 
way of finding your application -- you give it a directory name.  If 
Quixote or CherryPy, you give it a root object.  Maybe Zope would take a 
ZEO connection string, and so on.

>>Paste is really a deployment configuration.  Well, that as well as stuff 
>>to deploy.  And two frameworks.  And whatever else I feel a need or 
>>desire to throw in there.
> 
> 
> Yeah.  FWIW, as someone who has recently taken a brief look at Paste, I
> think it would be helpful (at least for newbies) to partition out the
> bits of Paste which are meant to be deployment configuration from the
> bits that are meant to be deployed.  Zope 2 fell into the same trap
> early on, and never recovered.  For example, ZPublisher (nee Bobo) was
> always meant to be able to be useful outside of Zope, but in practice it
> never happened because nobody could figure out how to disentangle it
> from its ever-increasing dependencies on other software only found in a
> Zope checkout.  In the end, nobody even remembered what its dependencies
> were *supposed* to be.  If you ask ten people, you'd get ten different
> answers.

Maybe with setuptools' namespace packages I can try this sometime.  It's 
not a high priority, though if splitting pieces out would make them more 
appealing then I could do that.

Deployment doesn't actually interest me, it's just a pain in the ass and 
I wanted to give it a go.  There's no real competition that I know of, 
because it's a boring and annoying problem ;)  So if I split it off, it 
might become accidentally orphaned...

> I also think that the rigor of separating out different components helps
> to make the software stronger and more easily understood in bite-sized
> pieces.  Unfortunately, separating them makes configuration tough, but I
> think that's what we're trying to find an answer about how to do "the
> right way" here.

Yes, you've reminded me why I brought this up, for that exact reason, 
though we've digressed a great deal.  Lots of pieces of Paste have zero 
(or close to it) dependencies, except for configuration.  That's what 
distinguishes a Paste component from a generic WSGI component, and I'm 
just as happy if there is no distinction.

>>Note also that parts of the pipeline are very much late bound.  For 
>>instance, the way I implemented Webware (and Wareweb) each servlet is a 
>>WSGI application.  So while there's one URLParser application, the 
>>application that actually handles the request differs per request.  If 
>>you start hanging more complete applications (that might have their own 
>>middleware) at different URLs, then this happens more generally.
> 
> 
> Well, if you put the "decider" in middleware itself, all of the
> middleware components in each pipeline could still be at least
> constructed early.  I'm pretty sure this doesn't really strictly qualify
> as "early binding" but it's not terribly dynamic either.  It also makes
> configuration pretty straightforward.  At least I can imagine a
> declarative syntax for configuring pipelines this way.

This is close to how Paste works now.  The typical middleware stack does 
everything but find the terminal application object, though with hooks 
if you are inclined to add yet more middleware (like the Paste 
examples.filebrowser.web.__init__.application() object I mentioned before).

> I'm pretty sure you're not advocating it, but in case you are, I'm not
> sure it adds as much value as it removes to be able to have a "dynamic"
> middleware chain whereby new middleware elements can be added "on the
> fly" to a pipeline after a request has begun.  That is *very* "late
> binding" to me and it's impossible to configure declaratively.

I'm comfortable with a little of both.  I don't even know *how* I'd stop 
dynamic middleware.  For instance, one of the methods I added to Wareweb 
recently allows any servlet to forward to any WSGI application; but from 
the outside the servlet looks like a normal WSGI application just like 
before.

I guess this is part of the advantage (and disadvantage) of completely 
opaque applications; you don't and can't know what they do.

>>>I've just seen Phillip's post where he implies that this kind of
>>>fine-grained component factoring wasn't really the initial purpose of
>>>WSGI middleware.  That's kind of a bummer. ;-)
>>
>>Well, I don't understand the services he's proposing yet.  I'm quite 
>>happy with using middleware the way I have been, so I'm not seeing a 
>>problem with it, and there's lots of benefits.
> 
> 
> I agree!  I'm a bit confused because one of the canonical examples of
> how WSGI middleware is useful seems to be the example of implementing a
> framework-agnostic sessioning service.  And for that sessioning service
> to be useful, your application has to be able to depend on its
> availability so it can't be "oblivious".

This is where I'd like additional (incrementally agreed upon) standards. 
  For instance, a standard for the interface of 'webapp01.session'. 
It's a requirement, certainly, but the requirement is merely "there must 
be a webapp01-compliant session installed".

> OTOH, the primary benefit -- to me, at least -- of modeling services as
> WSGI middleware is the fact that someone else might be able to use my
> service outside the scope of my projects (and thus help maintain it and
> find bugs, etc).  So if I've got the wrong concept of what kinds of
> middleware that I can expect "normal" people to use, I don't want to go
> very far down that road without listening carefully to Phillip.  Perhaps
> I'll have a shot at influencing the direction of WSGI to make it more
> appropriate for this sort of thing or maybe we'll come up with a better
> way of doing it.

Well, you can go some ways.  If you are distributing an application -- 
which can be very fine-grained -- you can always resort to invoking 
middleware yourself.  If you are distributing middleware or a library 
that depends on middleware, then dependencies are part of the deployment 
configuration.  Which has always been the case.

Also, a smart middleware can pretend to be many kinds of middleware, by 
putting objects with different (wrapper) interfaces in multiple keys. 
So if we have an explosion of incompatible session middlewares, for 
instance, we can ultimately create an ubersession that maintains 
backward compatibility and provides a forward-compatible interface.

> Zope 3 is a component system much like what I'm after, and I may just
> end up using it wholesale.  But my immediate problem with Zope 3 is that
> like Zope 2, it's a collection of libraries that have dependencies on
> other libraries that are only included within its own checkout and don't
> yet have much of a life of their own.  It's not really a technical
> problem, it's a social one... I'd rather have a somewhat messy framework
> with a lot of diversity composed of wildly differing component
> implementations that have a life of their own than to be be trapped in a
> clean, pure world where all the components are used only within that
> world.

My personal critique would be that Zope 3 adds novel concepts more than 
libraries, and they are better concepts than in Zope 2 (where "concept" 
was just whatever got thrown into the most base classes), but there's 
still a lot of concept there.  Some of them deserve to become part of 
the wider Python knowledge base.  I think some of them don't.  But 
there's no survival of the fittest, since the concepts depend on each other.

> I suspect there's a middle ground here somewhere.
> 
> 
>>>Factoring middleware components in this way seems to provide clear
>>>demarcation points for reuse and maintenance.  For example, I imagined a
>>>declarative security module that might be factored as a piece of
>>>middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .
>>
>>Yes, I read that before; I haven't quite figured out how to digest it, 
>>though.  This is probably in part because of the resource-based 
>>orientation of Zope, and WSGI is application-based, where applications 
>>are rather opaque and defined only in terms of function.
> 
> 
> Yes, it is a bit Zopeish because it assumes content lives at a path.
> This isn't always the case, I know, but it often is.  Well, it's a bit
> of a stretch, but an alternate decsec implementation might use a
> "content identifier" to determine the protection of a resource instead
> of a full path.
> 
> For example, if you're implementing an application that is very simple
> and takes one and only one URL, but calls it with a different query
> string variable to display different pieces of content (e.g.
> '/blog?entry_num=1234'), you might have one ACL as the "root" ACL but
> optionally protect each piece of content with a separate ACL if one can
> be found.  Maybe the content-specific ACL would be 'entry_num=1234'
> instead of a path.  

Zope really puts a lot of importance in paths; though I don't think 
typical Zope applications have any better URLs as a result.  I don't 
know if that's something specific to Zope, or merely the inevitable 
result that when you make something Important you make it Hard and 
Fragile.  I'd actually go for the latter, which is why I'd be very 
reluctant to make URL-based permissions anything more than one tool 
among many.

Something like services seem more practical in this case, or perhaps an 
advisory object that gets placed in the request if we're seeing what we 
can do without services.  The advisory object doesn't know what the 
entry_num=1234 object is, but the application can figure out how that 
object maps to what the advisory object knows about (e.g., owners and 
editors and whatnot).

But oh! that's exactly what you describe below.  With all these long 
emails I don't have the room in my brain to read ahead, because it all 
becomes a jumble of WSGIness.  Which is good, just hard...

> A function that accepts a form post for displaying
> or changing the blog entry for 1234 might look like this:
> 
> def blog(environ, start_response):
>     acl = environ['acl'] # added by decsec middleware
>     userid = environ['userid'] # added by an authentication middleware
>     formvars = get_form_vars_from(environ)
>     if formvars['action'] == "view":
>         permission = 'view'
>     elif formvars['action'] == "change":
>         permission = 'edit'
>     content = get_blog_entry(environ)
>     # pulls out the entry for 1234
>     if not acl.check(userid, permission):
>        start_response('401 Unauthorized', [])
>        return ['<html>Unauthorized</html>']
>    [ ... further code to change or display the blog entry ... ]
> 
> The ACL could be the "root" ACL (say, all users can view, members of the
> group "manager" could change, everything else is denied).  The "root"
> ACL would be used if content did not have its own ACL.  But associating
> an ACL with a content identifier would allow the developer or site
> manager to protect individual blog entries (e.g. 1234, 5678, etc) with
> different ACLs.  "Joe can view this one but he can't change it", "Jim
> can view all of them and can change all of them", etc.. the sorts of
> things useful for "staging" and workflow delegation without unduly
> mucking up the actual application code.
> 
> Decsec would also take into account the user's group memberships and so
> forth during the "check" step, so you wouldn't have to write any of this
> code either.  The "blog" example is stupid, of course, the concept is
> more useful for higher-security apps.
> 
> Sorry, all of this is somewhat besides the point of this thread, but it
> does provide an example of kind of functionality I'd like to be able to
> put into middleware.
> 
> 
>>>Of course, this sort of thing doesn't *need* to be middleware.  But
>>>making it middleware feels very right to me in terms of being able to
>>>deglom nice features inspired by Zope and other frameworks into pieces
>>>that are easy to recombine as necessary.  Implementations as WSGI
>>>middleware seems a nice way to move these kinds of features out of our
>>>respective applications and into more application-agnostic pieces that
>>>are very loosely coupled, but perhaps I'm taking it too far.
>>
>>Certainly these pieces of code can apply to multiple applications and 
>>disparate systems.  The most obvious instance right now that I think of 
>>is a WSGI WebDAV server (and someone's working on that for Google Summer 
>>of Code), which should be implemented pretty framework-free, simply 
>>because a good WebDAV implementation works at a low level.  But 
>>obviously you want that to work with the same authentication as other 
>>parts of the system.
> 
> 
> Yes.  In particular, if you knew you were working with an application
> that could resolve a path in terms of containers and contained pieces of
> content (just like a filesystem does), it would be pretty easy to code
> up a DAV "action middleware" component that rendered containerish things
> as DAV "collections" and contentish things as DAV "resources", and which
> could handle DAV locking and property rendering and so forth.
> 
> This kind of middleware might be tough, though, because it probably
> requires explicit cooperation from the end-point application (it expects
> to be talking to an actual filesystem, but that won't always be the case
> at least without some sort of adaptation).

I think WebDAV is very unripe for WSGI abstractions.  And even if I 
remember the Zope WebDAV code I briefly looked at, it special cases all 
sorts of things (e.g., based on user agent) because there's so much more 
semantics than with a normal web page.  It's the kind of place where 
introspection really would be helpful; though maybe the discipline of 
enforced decoupling would still help.

> But in any case, it's a good example of how we could prevent people from
> needing to reinvent the wheel... this guy appears to be coming up with
> his own identification, authentication, authorization, and challenge
> libraries entirely http://cwho.blogspot.com/ which just feels very
> wasteful.

Yes; I'm his advisor.  I've encouraged him to look at reusing stuff, but 
I really have to give stronger direction.

>>>Virtual hosting awareness
>>
>>I've never had a problem with this, except in Zope...
>>
>>Anyway, to me this feels like a kind of URL parsing.  One of the 
>>mini-proposals I made before involved a way of URL parsers to add URL 
>>variables to the system (basically a standard WSGI key to put URL 
>>variables as a dictionary).  So a pattern like:
>>
>>   (?<username>.*)\.myblogspace.com/(?<year>\d\d\d\d)/(?<month>\d\d)/
>>
>>Would add username, year, and month variables to the system.  But regex 
>>matching is just one way; the *result* of parsing is usually either in 
>>the object (e.g., you use domains to get entirely different sites), or 
>>in terms of these variables.
> 
> 
> Yes, this seems to be more of a problem for Zope because it's a) a
> long-running app with its own webserver b) has convenience functions for
> generating URLs based on its internal containment graph and c) doesn't
> deal well with relative URLs.  So if you want an application that lives
> in a "subfolder" of your Zope object graph to behave as if it lives at
> "http://example.com" instead of "http://example.com/subfolder", you need
> to give it clues.

Incidentally, since this is frequently a problem, for my applications 
I've been using something bookmark-like; at some point in the request 
(often just before URLParser is invoked) I store the SCRIPT_NAME and 
give it some name (like 'app_name.base_url').  Then I can construct all 
my URLs relative to that.  This still involves information I keep in my 
head (like how internal URLs are constructed), but at least it gets it 
right without hardcoding/configuring URLs, or being clever and getting 
it wrong.

>>>Transformation during rendering
>>
>>If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe 
>>for this sort of thing.
> 
> 
> Yes, that's what I meant.

Incidentally someone just did an XSLT middleware today: 
http://www.decafbad.com/blog/2005/07/18/discovering_wsgi_and_xslt_as_middleware

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org