[Web-SIG] wsgiconfig design

Sat Jul 7 21:01:03 CEST 2007

Jim Fulton wrote:
> 
> On Jul 6, 2007, at 11:41 PM, Ian Bicking wrote:
> 
>> Every so often I get in this cleanup/redux mood where I feel a need to
>> revisit things I've done before in an attempt to Do Them Right.
>>
>> We've discussed Paste Deploy here before, and I'm thinking about Redoing
>> It Right.
> 
> Cool.
> 
>> I thought I'd share some thoughts on the design:
>>
>> I still am quite happy with the entry points from Paste Deploy, and plan
>> to keep them,
> 
> Me too.  In fact, I'd really like them to have their own identity 
> independent if Paste Deploy. People should start supporting these entry 
> points even if they use some other application to put them together.

I've tried to encourage people to use this, but they get stuck on the
word "paste", so there's not many other people who consume or produce
these entry points except for use with Paste or related packages
(Pylons, etc).  I'm not sure what to do about that, except perhaps to
reset people's opinions with this rewrite.

> I do have one potential complaint about the entry-point APIs.  The 
> applications my company builds have configurations that are too complex 
> to fit in a single config-parser section.  To handle these 
> configurations, I'd need to be able to read multiple sections, or to 
> refer to an external configuration.  I think the later is the current 
> recommended approach for Paste Deploy.  If you want to keep that 
> approach, then the existing entry-point APIs are fine.  (I personally, 
> want to be able to put all of my configuration in a single file, but 
> zc.buildout lets me do that, so I don't need Paste Deploy to do that for 
> me.)

As I've been coding, I've actually been thinking about passing a
complete config dictionary in instead of global_conf.  So it would be
like {section: {option: value, ...}}.  This lets you look into other
application's sections, which maybe isn't ideal.

Another option that occurs to me now might be something like
[config:app_name section_name], and then pass in section_name={dict of
options} as a keyword argument.  I.e.:

   [/]
   use = egg:MyPackage
   greeting = Hello

   [config:/ email]
   smtp_server = localhost
   email = bob at example.com

That leads to mypackage.wsgi_app(global_conf, greeting="Hello",
email={'smtp_server': 'localhost', 'email': 'bob at example.com})

I think I prefer the latter.

>> I  will probably rename them, but also support the old Paste Deploy 
>> names.
> 
> Why?

Why rename?  To go with the new package, and because I might change the
entry points slightly (particularly global_conf or those extra config
sections).  I'd support the old names because I can, even if they are
somewhat different APIs; a nice side effect of explicit entry point groups.

>> There will be a few levels of pluggability.  The first is the loader
>> itself, which will default to the wsgiconfig loader itself.  This is
>> only applicable to file-based configuration.  When you load up the
>> application from a file, it will search for a #! line that specifies a
>> loader, which is the name of a distribution or module.  It will search
>> the distribution for some entry point (wsgiconfig.config_loader:main),
>> and that entry point is a callable like:
>>
>>    def config_loader(filename, options, object_type):
>>        return wsgi_application
> 
> which section would it get these from?

Which entry point group?  The group is [wsgiconfig.config_loader].  The
loader's name is taken from the #! line, so you don't have to use ini
format at all if you don't want to.  This is to placate people who
really hate that format ;)

>> options is a flat dictionary of options that are passed in, which the
>> config loader can use at its discretion.  A common way to use it would
>> be for variable substitution.  This allows for things like "paster serve
>> config.ini var1=value var2=value", for ad hoc customization.  It returns
>> a single application.  (This does mean that a feature from Paste Deploy
>> is lost, where you could do "paster serve config.ini#section_name" to
>> load a specific application from several defined in a file -- but you
>> are less likely to get dead sections or confusing config file 
>> factorings).
>>
>> object_type is the kind of object we want to get out.  Here I'll only
>> specify 'wsgi.application'.  'wsgi.server' will probably also be
>> implemented, but that's all I plan for now.  'wsgi.appserver' or
>> something might be possible, for the process manager that runs an entire
>> application.
> 
> I don't really follow this. Maybe an example would help.

Well, lets say you have a configuration like:

   [/]
   use = egg:MyApp

   [middleware:/]
   use = egg:Paste#profile

   [server:main] # or maybe just server?
   use = egg:Paste#http
   host = 127.0.0.1:${port}

Then you start it up with "serve config.ini port=8090".  That's the idea
of the options dictionary, it holds {'port': '8090'}.

When you load the config file, all the applications defined in the
config are collected and combined, by default using urlmap.  There's
only one in this example, '/'.  So what gets returned is basically:

   URLMap({'/': profile_middleware(MyApp_app(global_conf), global_conf)})

>> Unlike Paste Deploy, section names will not be arbitrary.  A section
>> name has a prefix and name.  The prefix, as in Paste Deploy, says what
>> you are describing.  The default prefix is "app:"; you can also give
>> "middleware:".  Prefixes not recognized will be ignored.  A possible
>> prefix might be "logging:" for logging, which if I don't implement it
>> will be initially ignored (but someone else could handle it).
> 
> IMO, it would be nice *not* to reinvent yet another logging 
> configuration handler. The standard library already defines one. If we 
> don't like it, we should make it better.

I don't like it, but I don't feel like improving it either ;)  Anyway,
this is basically just a convention to group all the sections together
for logging based on that prefix.  The logging module's configuration
handler can handle it, or we could wrap it slightly (if you loaded
logging you wouldn't actually be returning anything, you'd be updating
the global logging configuration, which may or may not be what we want).

>>   Similarly
>> the server as with Paste Deploy can be defined with "server:".  For now
>> all we're concerned with is applications, middleware, and composites.
> 
> Maybe I'm missing your point, but I thought the value of Paste Deploy 
> was to be able to have a way to define and end-to-end configuration of 
> applications, middleware and server.

The server is a little bit of an outlier.  The applications and
middleware can be composed directly and fairly opaquely, but the server 
needs to be connected to the application more explicitly and outside of 
wsgiconfig.  OTOH, it's real handy to be able to put the server section 
in the same config file.

>> The applications and middleware are grouped together using the names.
>> That is, if you have an application "/" and a middleware "middleware:/",
>> then the middleware wraps that application.  Middleware sections can
>> have trailing numbers to indicate ordering and keep section names
>> unique.  Thus "middleware:/ 1", "middleware:/ 2", etc.  Negative numbers
>> and floats are allowed.  Anything but trailing numbers is considered
>> part of the name; thus names can have parameters or other structure.
> 
> Hm.  Sounds a bit too magic to me.  Maybe an example will make it look 
> better. :)

Well, what we are trying to create is a basic 
middleware1(middleware2(app)) composition, where the app is required and 
the middleware is not.

We group these together by name, with urlmap that name is a path.  So / 
is the main app, /blog is the app mounted at /blog, etc.  Then we need 
an ordered list of the middleware to apply.  There needs to be some way 
to distinguish a middleware section from an application section, hence 
middleware:.  And then a way of ordering them.  We could use the section 
ordering, except duplicate section names are no good anyway, even if we 
did keep track of the order they were defined in.  So I'm proposing a 
trailing number.

The rest of the name is significant.  For instance, a composer might 
take a section name like [/ domain=foo.com], and use that entire section 
name to determine that it's mapped to a specific vhost.  The middleware 
would have to use exactly that same name, [middleware:/ domain=foo.com].

>> All the applications in the server are put in a single dictionary, and
>> that is based to the composer.  The composer by default is urlmap (which
>> also includes optional host-based dispatch).  You can specify another
>> composer with a global option "composer = (specifier)"
> 
> I'm not sure how I feel about that.

What would be the problem?

>> The sections still have a "use" option, which indicates what implements
>> the option.  It will take "egg:", and dotted module/object names, and
>> module names may have trailing entry point specifiers (if the object
>> doesn't implement the "preferred" interface for that kind of output).
>> You cannot use "config:" anymore, instead that will be handled by the
>> config file format.
> 
> Good. I found this dual use of "use" to be confusing.

Actually I realize there has to be something like this, but probably 
simpler.  That is, you have to be able to say, "for this application, 
get the application from this other file".  That could simply be:

   [/blog]
   use = egg:WSGIConfig#load_config
   config_file = blog.ini

But that's kind of awkward, so I think it would be better if there was a 
clearer construct.  blog.ini might itself have internal structure and 
multiple applications, so we can't just use config file inlining to 
accomplish this.

>> The config files will use INITools, which is similar to ConfigParser but
>> a bit nicer.  It will include improved string substitution, including
>> similar constructs to what zc.buildout has.  While still keeping
>> applications firmly separated from the config files, I'm planning on
>> paying close attention to call signatures and exceptions to give good
>> error messages.  E.g., if you raise something like TypeError("... got an
>> unexpected keyword argument 'X'") I'll figure out where the keyword
>> arguments came from and improve the text of that error message.  Not
>> perfect, but should be passable, and better than what Paste Deploy 
>> does now.
> 
> One suggestion to improve error detection/warning is to use 
> RawConfigParse rather than ConfigParse and to track option access.  A 
> common mistake when writing configurations is to misspell something.  
> You end up with options that are ignored because they are misspelled. 
> This sort of error can be hard to spot.  Handlers could complain about 
> unused options, but this is hard to do if a DEFAULT sections causes 
> options to appear in all sections.  Also, expecting handlers to do this 
> sort of error checking is a bit if a burden on handler writers. This may 
> not be such a problem for Paste Deploy handlers as a is for buildout 
> recipes.  In buildout, I decided to shift this error checking to 
> buildout itself.  Buildout tracks option access and warns about unused 
> options.  This can be very helpful and puts no burden on recipe writers.

The config files in this model entirely push out information, no one 
goes fishing into the config file.  So there's no way to determine access.

If you are restrictive in your entry point and don't include **kw, then 
you can raise errors about misspelled configuration.  This isn't true in 
global_conf, and I suppose I could track that, but tracking a dictionary 
around into Python code makes me a little uncomfortable.  It can be a 
real problem when I can't remember whether the error email setting 
(which is typically inherited through global_conf) is error_email or 
email_errors.

I am thinking about adding some kind of logging to all of this, so it's 
easy both to get an explanation of what exactly is being constructed, 
and to get a detailed report when something goes wrong.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
             : Write code, do good : http://topp.openplans.org/careers