[Web-SIG] Standardized configuration

Sat Jul 23 00:26:01 CEST 2005

Chris McDonough wrote:
> I've had a stab at creating a simple WSGI deployment implementation.
> I use the term "WSGI component" in here as shorthand to indicate all
> types of WSGI implementations (server, application, gateway).
> 
> The primary deployment concern is to create a way to specify the
> configuration of an instance of a WSGI component, preferably within a
> declarative configuration file.  A secondary deployment concern is to
> create a way to "wire up" components together into a specific
> deployable "pipeline".  
> 
> A strawman implementation that solves both issues via the
> "configurator", which would be presumed to live in "wsgiref". Currently
> it lives in a package named "wsgiconfig" on my laptop.  This module
> follows.

I have a weird problem reading unhighlighted source.  I dunno why.  But 
anyway, the configuration file is what interests me most...

>   To do this, we use a ConfigParser-format config file named
>   'myapplication.conf' that looks like this::
> 
>     [application:sample1]
>     config = sample1.conf
>     factory = wsgiconfig.tests.sample_components.factory1
> 
>     [application:sample2]
>     config = sample2.conf
>     factory = wsgiconfig.tests.sample_components.factory2
> 
>     [pipeline]
>     apps = sample1 sample2

I think it's confusing to call both these applications.  I think 
"middleware" or "filter" would be better.  I think people understand 
"filter" far better, so I'm inclined to use that.  So...

[application:sample2]
# What is this relative to?  I hate both absolute paths and
# paths relative to pwd equally...
config = sample1.conf
factory = wsgiconfig...

[filter:sample1]
config = sample1.conf
factory = ...

[pipeline]
# The app is unique and special...?
app = sample2
filters = sample1

Well, that's just a first refactoring; I'm having other inclinations...

> Potential points of contention
> 
>  - The WSGI configurator assumes that you are willing to write WSGI
>    component factories which accept a filename as a config file.  This
>    factory returns *another* factory (typically a class) that accepts
>    "the next" application in the pipeline chain and returns a WSGI
>    application instance.  This pattern is necessary to support
>    argument currying across a declaratively configured pipeline,
>    because the WSGI spec doesn't allow for it.  This is more contract
>    than currently exists in the WSGI specification but it would be
>    trivial to change existing WSGI components to adapt to this
>    pattern.  Or we could adopt a pattern/convention that removed one
>    of the factories, passing both the "next" application and the
>    config file into a single factory function.  Whatever.  In any
>    case, in order to do declarative pipeline configuration, some
>    convention will need to be adopted.  The convention I'm advocating
>    above seems to already have been for the current crop of middleware
>    components (using a factory which accepts the application as the
>    first argument).

I hate the proliferation of configuration files this implies.  I 
consider the filters an implementation detail; if they each have 
partitioned configuration then they become a highly exposed piece of the 
architecture.

It's also a lot of management overhead.  Typical middleware takes 0-5 
configuration parameters.  For instance, paste.profilemiddleware is 
perfectly usable with no configuration at all, and only has two parameters.

But this is reasonably easy to resolve -- there's a perfectly good 
configuration section sitting there, waiting to be used:

   [filter:profile]
   factory = paste.profilemiddleware.ProfileMiddleware
   # Show top 50 functions:
   limit = 50

This in no way precludes 'config', which is just a special case of this 
general configuration.  The only real problem is a possible conflict if 
we wanted to add new special names to the configuration, i.e., 
meta-filter-configuration.

Another option is indirection like:

   [filter:profile]
   factory = paste.profilemiddleware.ProfileMiddleware

   [config:profile]
   limit = 50

If we do something like this, the interface for these factories does 
become larger, as we're passing in objects that are more complex than 
strings.

Another thing this could allow is recursive configuration, like:

[application:urlmap]
factory = paste.urlmap.URLMapBuilder
app1 = blog
app1.url = /
app2 = statview
app2.url = /stats
app3 = cms
app3.host = dev.*

[application:blog]
factory = leonardo.wsgifactory
config = myblog.conf

[application:statview]
factory = statview
log_location = /var/logs/apache2

[application:cms]
factory = proxy
location = http://localhost:8080
map = / /cms.php

[pipeline]
app = urlmap

So URLMapBuilder needs the entire configuration file passed in, along 
with the name of the section it is building.  It then reads some keys, 
and builds some named applications, and creates an application that 
delegates based on patterns.  That's the kind of configuration file I 
could really use.

Of course, if I really wanted this I could implement:

[application:configurable]
factory = paste.configurable_pipeline
conf = abetterconffile.conf

But then the configuration file becomes a dummy configuration, and no 
one else gets to use my fancier middleware with the normal configuration 
file.

>  - Pipeline deployment configuration should be used only to configure
>    essential information about pipeline and individual pipeline
>    components.  Where complex service data configuration is necessary,
>    the component which implements a service should provide its own
>    external configuration mechanism.  For example, if an XSL service
>    is implemented as a WSGI component, and it needs configuration
>    knobs of some kind, these knobs should not live within the WSGI
>    pipeline deployment file.  Instead, each component should have its
>    own configuration file.  This is the purpose (undemonstrated above)
>    of allowing an [application] section to specify a config filename.

The intelligent finding of files is important to me with any references 
to filenames.  Working directory is, IMHO, fragile and unreliable. 
Absolute paths are reliable but fragile.

In some cases module names are a more robust way of location resources, 
if those modules are self-describing applications.  Mostly because 
there's a search path.  Several projects encourage this kind of system, 
though I'm not particularly fond of it because it mixes 
installation-specific files with code.

>  - Some people have seem to be arguing that there should be a single
>    configuration format across all WSGI applications and gateways to
>    configure everything about those components.  I don't think this is
>    workable.  I think the only thing that is workable is to recommend
>    to WSGI component authors that they make their components
>    configurable using some configuration file or other type of path
>    (URL, perhaps).  The composition, storage, and format of all other
>    configuration data for the component should be chosen by the
>    author.

While I appreciate the difficulty of agreeing on a configuration format, 
the way this proposal avoids that is by underpowering the deployment 
file so that authors are forced to create other configuration files.

>  - Threads which discussed this earlier on the web-sig list included
>    the idea that a server or gateway should be able to "find" an
>    end-point application based on a lookup of source file/module +
>    attrname specified in the server's configuration.  I'm suggesting
>    instead that the mapping between servers, gateways, and
>    applications be a pipeline and that the pipeline itself have a
>    configuration definition that may live outside of any particular
>    server, gateway, or application.  The pipeline definition(s) would
>    wire up the servers, gateways, and applications itself.  The
>    pipeline definition *could* be kept amongs the files representing a
>    particular server instance on the filesystem (and this might be the
>    default), but it wouldn't necessarily have to be.  This might just
>    be semantics.

I think it's mostly semantics.

>  - There were a few mentions of being able to configure/create a WSGI
>    application at request time by passing name/value string pairs
>    "through the pipeline" that would ostensibly be used to create a
>    new application instance (thereby dynamically extending or
>    modifying the pipeline).  I think it's fine if a particular
>    component does this, but I'm suggesting that a canonization of the
>    mechanism used to do this is not necessary and that it's useful to
>    have the ability to define static pipelines for deployment.

It does concern me that we allow for dynamic systems.  A dynamic system 
allows for more levels of abstraction in deployment, meaning more 
potential for automation.

I think this can be achieved simply by defining a standard based on the 
object interface, where the configuration file itself is a reference 
implementation (that we expect people will usually use).  Semantics from 
the configuration file will leak through, but it's lot easier to deal 
with (for example) a system that can only support string configuration 
values, than a system based on concrete files in a specific format.

>  - If elements in the pipeline depend on "services" (ala
>    Paste-as-not-a-chain-of-middleware-components), it may be
>    advantageous to create a "service manager" instead of deploying
>    each service as middleware.  The "service manager" idea is not a
>    part of the deployment spec.  The service manager would itself
>    likely be implemented as a piece of middleware or perhaps just a
>    library.

That might be best.  It's also quite possible for the factory to 
instantiate more middleware.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org