[Web-SIG] WSGI deployment config

Sun Aug 7 19:16:53 CEST 2005

At 05:23 PM 8/7/2005 +0100, James Gardner wrote:
>This has three advantages:
>
>1. Standardisation between similar WSGI middleware components becomes 
>easier because we could all agree to name standard database connection 
>parameters as database so middleware can be more interoperable. 
>Non-standard extensions can be named in a similar config section but 
>without the database label so that we define an extensible base standard.

The point isn't to have a standardized format or globally accessible 
configuration, it's to *hide* configuration so that other objects don't 
have to know about it.

>2. Configuration can be accessed in code by name eg config.get('database') 
>or config.getAll('database') to get custom extensions too. This means that 
>whatever version of a package you are using you can still refer to the 
>correct configuration easily and also use the configuration file in 
>external scripts eg. to setup necessary database tables etc without 
>creating the full middleware chain.

That doesn't require access to the data as data; the database should just 
be a service.  An example of why: PEAK uses database connection URLs like 
"postgres://foo:bar@example.com/dbname" to designate databases, so it would 
be a step back to force PEAK users to use your user/password/etc. 
configuration scheme in order to be able to interoperate.

It makes more sense, therefore, to have configuration be private to 
components unless those components *want* to share that 
configuration.  However, the way they share it might be different than the 
input.  For example, PEAK has a URL connection class that has 
user/password/etc. attributes on it, so it could certainly implement an 
interface to provide that information to components that want it.  But that 
doesn't mean that the *source* configuration was done that way.  Preserving 
a separation between interface and implementation is vital to the 
maintainability of the overall system.

>3. It allows us to create a configuration hierarchy. I've written a WSGI 
>framework named Bricks http://www.pythonweb.org/bricks/ and the way it 
>works is to have a global config file for all applications at a site and 
>then a local config file if the application needs to override global 
>settings or provide extra middleware. The logic behind this is that things 
>like database connections are likely to be used by all applications across 
>a site and a new application you have installed from a third party is not 
>going to have the correct database settings so you would want to use the 
>settings defined in the global config file. Using the new config file 
>format we could simply say that if a global configuration does not already 
>have a named config section which appears in a local config file then the 
>local configuration is added below the last piece of global configuration 
>that matched (or at the end if no matches were found).

I'm -1 on exposing the data as direct configuration.  It should be opaque, 
and accessed as *services*.  Otherwise you're just reinventing the worst 
problems of Zope 2-era design.

We probably *do* need a way to declare services (like your database 
example), and a service discovery API.  We *don't* want to make deployment 
data into directly-accessible configuration.  This doesn't mean you can't 
create service objects whose whole job is to provide configuration data in 
some way, it just means that the deployment parameters themselves should be 
opaque.

The reason for this is that without encapsulation, you get spaghetti 
dependencies, and it becomes difficult to change things programmatically if 
you have no way to influence data dynamically.  This was a really big 
problem in older versions of Zope 2, that encouraged acquisition of random 
configuration properties.  There's really no point in us repeating that 
mistake.

Here's what I'd suggest as an alternative, using a slight syntax tweak:

    [sql service from somedbpackage]
    conn = "some://url"    # or you can do it the awkward way instead
    # ... etc.

So "service" or "service from" are the keywords to define a service.  For 
"service from", the first part is looked up in a wsgi.service_factories 
entry point group.  For "service", it's just imported.  Either way, the 
factory is invoked with the previous service provider to create a kind of 
"service chain".  The current head of the service chain is passed into 
middleware and application factories as the first parameter, so they can 
use it to find services.

We then define a simple API for walking the service chain and locating 
services by name or other keys.  This approach is capable of doing 
everything you've proposed, except that it doesn't provide access to the 
private configuration data of individual services.  It would be possible, 
however, to load the service chain from a deployment file without 
instantiating applications or middleware, in order to e.g. run utility 
programs.  You can still include arbitrary configuration if you want, just 
by creating a service whose job is to provide such information.

The only other piece I think we're missing is a way to handle branching, 
because our pipeline configuration is quite linear.  There's no obvious way 
to branch at the moment, except by having a way to configure a middleware 
component to refer to other pipelines.