[Web-SIG] WSGI deployment use case

Tue Jul 26 04:54:01 CEST 2005

Phillip J. Eby wrote:
> At 08:29 PM 7/25/2005 -0500, Ian Bicking wrote:
> 
>> Right now Paste hands around a fairly flat dictionary.  This 
>> dictionary is passed around in full (as part of the WSGI environment) 
>> to every piece of middleware, and actually to everything (via an 
>> import and threadlocal storage).  It gets used all over the place, and 
>> the ability to draw in configuration without passing it around is very 
>> important.  I know it seems like heavy coupling, but in practice it 
>> causes unstable APIs if it is passed around explicitly, and as long as 
>> you keep clever dynamic values out of the configuration it isn't a 
>> problem.
>>
>> Anyway, every piece gets the full dictionary, so if any piece expected 
>> a constrained set of keys it would break.  Even ignoring that there 
>> are multiple consumers with different keys that they pull out, it is 
>> common to create intermediate configuration values to make the 
>> configuration more abstract.  E.g., I set a "base_dir", then derive 
>> "publish_dir" and "template_dir" from that.  Apache configuration is a 
>> good anti-example here; its lack of variables hurts me daily.  While 
>> some variables could be declared "abstract" somehow, that adds 
>> complexity where the unconstrained model avoids that complexity.
> 
> 
> *shudder* I think someone just walked over my grave.  ;)
> 
> I'd rather add complexity to the deployment format (e.g. variables, 
> interpolation, etc.) to handle this sort of thing than add complexity to 
> the components.  I also find it hard to understand why e.g. multiple 
> components would need the same "template_dir".  Why isn't there a 
> template service component, for example?

In that case, no, multiple components are unlikely to usefully share 
template_dir.  But that's not an issue I'm really hitting -- though it 
does start to add importance to the order in which configuration files 
are loaded.

>> When one piece delegates to another, it passes the entire dictionary 
>> through (by convention, and by the fact it gets passed around 
>> implicitly).  It is certainly possible in some circumstances that a 
>> filtered version of the configuration should be passed in; that hasn't 
>> happened to me yet, but I can certainly imagine it being necessary 
>> (especially when a larger amount of more diverse software is running 
>> in the same process).
>>
>> One downside of this is that there's no protection from name 
>> conflicts.  Though name conflicts can go both ways.  The Happy 
>> Coincidence is when two pieces use the same name for the same purpose 
>> (e.g., it's highly likely "smtp_server" would be the subject of a 
>> Happy Coincidence).  An Unhappy Coincidence is when two pieces use the 
>> same value for different purposes ("publish_dir" perhaps).  An 
>> Expected Coincidence is when the same code, invoked in two separate 
>> call stacks, consumes the same value.  Of course, I allow 
>> configuration to be overwritten depending on the request, so high 
>> collision names (like publish_dir) in practice are unlikely to be a 
>> problem.
> 
> 
> I think you've just explained why this approach doesn't scale very well, 
> even to a large team, let alone to inter-organization collaboration 
> (i.e. open source projects).

I admit there's problems.  On the other hand, it's a similar problem as 
the fact that attributes on objects don't have namespaces.  It causes 
problems, but those problems aren't so bad in practice.

If you can offer something where configuration can be applied to a set 
of components without exposing the internal structure of those 
components, and without the frontend copying each piece destined for an 
internal application explicitly, then great.  I'm not closed to other 
ideas, but I'm not happy putting it off either.  Back when I started up 
this WSGI thread, it was about just this issue, so it's one of the 
things I'm fairly concerned about.

Unlike deployment, this issue of configuration touches all of my code. 
So I'm happier putting off deployment, which though it is suboptimal 
currently, I suspect my code will be forward-compatible to without great 
effort.

>>   For instance an application-specific middleware that could plausibly 
>> be used more widely -- does it consume the application configuration, 
>> or does it take its own configuration?  But even excluding those 
>> ambiguous situations, the way my middleware is factored is an internal 
>> implementation detail, and I don't feel comfortable pushing that 
>> structure into the configuration.
> 
> 
> That's what encapsulation is for.  Just create a factory that takes a 
> set of application-level parameters (like template_dir, publish_dir, 
> etc.) and then *passes* them to the lower level components.
> 
> Heck, we could even add that to the .wsgi format...
> 
>    # app template file
>    [WSGI options]
>    parameters = "template_dir", "publish_dir", ...
> 
>    [filter1 from foo]
>    some_param = template_dir
> 
>    [filter2 from bar]
>    other_param = publish_dir
> 
> 
>    # deployment file
>    [use file "app_template.wsgi"]
>    template_dir = "/some/where"
>    publish_dir = "/another/place"

I'm not clear exactly what you are proposing.  Let's use a more 
realistic example.  Components:

* Exception catcher.  Takes "email_errors", which is a list of addresses 
to email exceptions to.  I want to apply this globally.

* An application mounted on /, which takes "document_root" and serves up 
those files directly.

* An application mounted at /blog, takes "database" (a string) where all 
its information is kept.

* An application mounted at /admin.  Takes "document_root", which is 
where the editable files are located.  Around it goes two pieces of 
middleware...

* A authentication middleware, which takes "database", which is where 
user information is kept.  And...

* An authorization middleware, that takes "allowed_roles", and checks it 
against what the authentication middleware puts in.

How would I configure that?

>> So that's the issue I'm concerned about.
> 
> 
> I think the right way to fix it is parameterization; that way you don't 
> push a global (and non type-checkable) namespace down into each 
> component.  Components should have an extremely minimal configuration 
> with fairly specific parameters, because it makes early error checking 
> easier, and you don't have to search all over the place to find how a 
> parameter is used, etc., etc.

If we define schemas for the configuration that components take, that's 
fine with me.  I don't mind being explicit in the design of the 
components.  I just don't want to push all the internal structure into 
the deployment file, and I don't want changes to the design of a 
component to effect the design of anything that might wrap that component.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org