[Web-SIG] wsgiconfig design

Sun Jul 8 14:10:27 CEST 2007

On Jul 7, 2007, at 3:01 PM, Ian Bicking wrote:

> Jim Fulton wrote:
..
>> I do have one potential complaint about the entry-point APIs.  The  
>> applications my company builds have configurations that are too  
>> complex to fit in a single config-parser section.  To handle these  
>> configurations, I'd need to be able to read multiple sections, or  
>> to refer to an external configuration.  I think the later is the  
>> current recommended approach for Paste Deploy.  If you want to  
>> keep that approach, then the existing entry-point APIs are fine.   
>> (I personally, want to be able to put all of my configuration in a  
>> single file, but zc.buildout lets me do that, so I don't need  
>> Paste Deploy to do that for me.)
>
> As I've been coding, I've actually been thinking about passing a
> complete config dictionary in instead of global_conf.  So it would be
> like {section: {option: value, ...}}.

BTW, I've become a big fan of the ConfigParser module and format,  
mainly because of it's simplicity. The simplicity of a dictionary of  
dictionaries is very powerful, IMO.  Of course, the ConfigParser API  
is pretty horrible.  Fortunately, it's trivial to convert a parser  
object to a mapping of mappings and that's generally one of the first  
things I do after I've parsed a file.

> This lets you look into other
> application's sections, which maybe isn't ideal.

Why? Are you afraid that a handler will look at something it  
shouldn't?  Who cares? Relax. This isn't the Spanish Inquisition. :)

Embrace the simplicity of a mapping of mappings.

> Another option that occurs to me now might be something like
> [config:app_name section_name], and then pass in section_name={dict of
> options} as a keyword argument.  I.e.:
>
>   [/]
>   use = egg:MyPackage
>   greeting = Hello
>
>   [config:/ email]
>   smtp_server = localhost
>   email = bob at example.com
>
> That leads to mypackage.wsgi_app(global_conf, greeting="Hello",
> email={'smtp_server': 'localhost', 'email': 'bob at example.com})
>
> I think I prefer the latter.

I prefer the simpler model.  For one thing, it lets you share data  
among multiple sections. Maybe this isn't important for Paste Deploy.  
Having said that, I think your suggested is fine and is also less  
verbose than the simpler approach, because, in the simpler approach,  
the root section will tend to have options saying what other sections  
to read.

If you are going to do something like this, then, IMO, you might also  
consider:

   [/]
   use = egg:MyPackage
   greeting = Hello
   email =
       smtp_server = localhost
       email = bob at example.com

>
>>> I  will probably rename them, but also support the old Paste  
>>> Deploy names.
>> Why?
>
> Why rename?  To go with the new package, and because I might change  
> the
> entry points slightly (particularly global_conf or those extra config
> sections).  I'd support the old names because I can, even if they are
> somewhat different APIs; a nice side effect of explicit entry point  
> groups.

Well, if you change the APIs, you pretty much have to rename them.   
But I wouldn't do it otherwise.  Of course, it's up to you.

>>> There will be a few levels of pluggability.  The first is the loader
>>> itself, which will default to the wsgiconfig loader itself.  This is
>>> only applicable to file-based configuration.  When you load up the
>>> application from a file, it will search for a #! line that  
>>> specifies a
>>> loader, which is the name of a distribution or module.  It will  
>>> search
>>> the distribution for some entry point  
>>> (wsgiconfig.config_loader:main),
>>> and that entry point is a callable like:
>>>
>>>    def config_loader(filename, options, object_type):
>>>        return wsgi_application
>> which section would it get these from?
>
> Which entry point group?

I'm sorry, I was unclear.  I meant to ask section will the options  
came from. You answered that below.

...

>>> options is a flat dictionary of options that are passed in, which  
>>> the
>>> config loader can use at its discretion.  A common way to use it  
>>> would
>>> be for variable substitution.  This allows for things like  
>>> "paster serve
>>> config.ini var1=value var2=value", for ad hoc customization.  It  
>>> returns
>>> a single application.  (This does mean that a feature from Paste  
>>> Deploy
>>> is lost, where you could do "paster serve  
>>> config.ini#section_name" to
>>> load a specific application from several defined in a file -- but  
>>> you
>>> are less likely to get dead sections or confusing config file  
>>> factorings).
>>>
>>> object_type is the kind of object we want to get out.  Here I'll  
>>> only
>>> specify 'wsgi.application'.  'wsgi.server' will probably also be
>>> implemented, but that's all I plan for now.  'wsgi.appserver' or
>>> something might be possible, for the process manager that runs an  
>>> entire
>>> application.
>> I don't really follow this. Maybe an example would help.
>
> Well, lets say you have a configuration like:
>
>   [/]
>   use = egg:MyApp
>
>   [middleware:/]
>   use = egg:Paste#profile
>
>   [server:main] # or maybe just server?
>   use = egg:Paste#http
>   host = 127.0.0.1:${port}
>
> Then you start it up with "serve config.ini port=8090".  That's the  
> idea
> of the options dictionary, it holds {'port': '8090'}.

Ah, so command-line options.

In the example, a port is something you'd want to make available in a  
server section isn't it?  Why do you want the loader to get command- 
line options?

...

>>> Unlike Paste Deploy, section names will not be arbitrary.  A section
>>> name has a prefix and name.  The prefix, as in Paste Deploy, says  
>>> what
>>> you are describing.  The default prefix is "app:"; you can also give
>>> "middleware:".  Prefixes not recognized will be ignored.  A possible
>>> prefix might be "logging:" for logging, which if I don't  
>>> implement it
>>> will be initially ignored (but someone else could handle it).
>> IMO, it would be nice *not* to reinvent yet another logging  
>> configuration handler. The standard library already defines one.  
>> If we don't like it, we should make it better.
>
> I don't like it, but I don't feel like improving it either ;)

I hope you don't consider that a reason to reinvent it.  I would hope  
that, in the future, when someone gets that itch, they'll resist and  
improve the standard one instead.

We invented ZConfig which has it's own logging configuration  
"schema".  The result?  It hasn't remained up to date with the  
logging package and people who use it don't have access to some  
useful loggers without screwing with ZConfig schemas (which isn't  
fun), Bad bad bad.

>   Anyway,
> this is basically just a convention to group all the sections together
> for logging based on that prefix.  The logging module's configuration
> handler can handle it,

It can?  I think it looks for specific un-prefixed section names.

> or we could wrap it slightly (if you loaded
> logging you wouldn't actually be returning anything, you'd be updating
> the global logging configuration, which may or may not be what we  
> want).

I'm not sure what you mean here.  In theory, if you simply let people  
use the sections defined by the logging module, you could point the  
standard logging module at your config and be done.  You could even  
condition this on whether the defined sections are present.   
Unfortunately, I don't speak from experience because the applications  
I routinely use use ZConfig.

>
>>>   Similarly
>>> the server as with Paste Deploy can be defined with "server:".   
>>> For now
>>> all we're concerned with is applications, middleware, and  
>>> composites.
>> Maybe I'm missing your point, but I thought the value of Paste  
>> Deploy was to be able to have a way to define and end-to-end  
>> configuration of applications, middleware and server.
>
> The server is a little bit of an outlier.  The applications and
> middleware can be composed directly and fairly opaquely, but the  
> server needs to be connected to the application more explicitly and  
> outside of wsgiconfig.  OTOH, it's real handy to be able to put the  
> server section in the same config file.

IMO, it's very important to put the server in the config.  Why make  
the program using the config do that?

I really want to to be able to at least do all of the WSGI  
configuration in one place.

Note that, traditionally, Zope has allowed multiple servers to exist  
in a single process.  For smaller applications that can be handled by  
a single process, this is a significant win.  Selfishly, this isn't  
so important to me as the applications ZC deals with are large scale  
and have many processes so having a single server per process is the  
norm for us. Others may perceive the loss though,

>>> The applications and middleware are grouped together using the  
>>> names.
>>> That is, if you have an application "/" and a middleware  
>>> "middleware:/",
>>> then the middleware wraps that application.  Middleware sections can
>>> have trailing numbers to indicate ordering and keep section names
>>> unique.  Thus "middleware:/ 1", "middleware:/ 2", etc.  Negative  
>>> numbers
>>> and floats are allowed.  Anything but trailing numbers is considered
>>> part of the name; thus names can have parameters or other structure.
>> Hm.  Sounds a bit too magic to me.  Maybe an example will make it  
>> look better. :)
>
> Well, what we are trying to create is a basic middleware1 
> (middleware2(app)) composition, where the app is required and the  
> middleware is not.
>
> We group these together by name, with urlmap that name is a path.   
> So / is the main app, /blog is the app mounted at /blog, etc.  Then  
> we need an ordered list of the middleware to apply.  There needs to  
> be some way to distinguish a middleware section from an application  
> section, hence middleware:.  And then a way of ordering them.  We  
> could use the section ordering, except duplicate section names are  
> no good anyway, even if we did keep track of the order they were  
> defined in.  So I'm proposing a trailing number.

Personally, I much prefer explicit composition sections, as I think  
you have no.  Then you simply have an option that names the nodes to  
be composed in order.

...

>>> All the applications in the server are put in a single  
>>> dictionary, and
>>> that is based to the composer.  The composer by default is urlmap  
>>> (which
>>> also includes optional host-based dispatch).  You can specify  
>>> another
>>> composer with a global option "composer = (specifier)"
>> I'm not sure how I feel about that.
>
> What would be the problem?

I'd prefer the composers be more explicitly part of the  
configuration.  That is, a composer is defined with a section, like  
everything else.

>>> The sections still have a "use" option, which indicates what  
>>> implements
>>> the option.  It will take "egg:", and dotted module/object names,  
>>> and
>>> module names may have trailing entry point specifiers (if the object
>>> doesn't implement the "preferred" interface for that kind of  
>>> output).
>>> You cannot use "config:" anymore, instead that will be handled by  
>>> the
>>> config file format.
>> Good. I found this dual use of "use" to be confusing.
>
> Actually I realize there has to be something like this, but  
> probably simpler.

Sure.

>   That is, you have to be able to say, "for this application, get  
> the application from this other file".  That could simply be:
>
>   [/blog]
>   use = egg:WSGIConfig#load_config
>   config_file = blog.ini
>
> But that's kind of awkward, so I think it would be better if there  
> was a clearer construct.  blog.ini might itself have internal  
> structure and multiple applications, so we can't just use config  
> file inlining to accomplish this.

I mainly think this is a different concept and should have a separate  
option name, whatever syntax is used.

>>> The config files will use INITools, which is similar to  
>>> ConfigParser but
>>> a bit nicer.  It will include improved string substitution,  
>>> including
>>> similar constructs to what zc.buildout has.  While still keeping
>>> applications firmly separated from the config files, I'm planning on
>>> paying close attention to call signatures and exceptions to give  
>>> good
>>> error messages.  E.g., if you raise something like TypeError("...  
>>> got an
>>> unexpected keyword argument 'X'") I'll figure out where the keyword
>>> arguments came from and improve the text of that error message.  Not
>>> perfect, but should be passable, and better than what Paste  
>>> Deploy does now.
>> One suggestion to improve error detection/warning is to use  
>> RawConfigParse rather than ConfigParse and to track option  
>> access.  A common mistake when writing configurations is to  
>> misspell something.  You end up with options that are ignored  
>> because they are misspelled. This sort of error can be hard to  
>> spot.  Handlers could complain about unused options, but this is  
>> hard to do if a DEFAULT sections causes options to appear in all  
>> sections.  Also, expecting handlers to do this sort of error  
>> checking is a bit if a burden on handler writers. This may not be  
>> such a problem for Paste Deploy handlers as a is for buildout  
>> recipes.  In buildout, I decided to shift this error checking to  
>> buildout itself.  Buildout tracks option access and warns about  
>> unused options.  This can be very helpful and puts no burden on  
>> recipe writers.
>
> The config files in this model entirely push out information, no  
> one goes fishing into the config file.  So there's no way to  
> determine access.
>
> If you are restrictive in your entry point and don't include **kw,  
> then you can raise errors about misspelled configuration.

Sure, but don't options put in DEFAULT appear everywhere?  Won't that  
make it impossible to avoid **kw and to complain about unrecognized  
options?

>   This isn't true in global_conf, and I suppose I could track that,  
> but tracking a dictionary around into Python code makes me a little  
> uncomfortable.  It can be a real problem when I can't remember  
> whether the error email setting (which is typically inherited  
> through global_conf) is error_email or email_errors.

It's OK to make handlers do error checking, but, unless I'm missing  
something, DEFAULT works against error checking no matter where it's  
done.

> I am thinking about adding some kind of logging to all of this, so  
> it's easy both to get an explanation of what exactly is being  
> constructed, and to get a detailed report when something goes wrong.

Logging is good.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org