[Web-SIG] multi-threaded or multi-process wsgi apps

Mon Nov 26 18:15:12 CET 2007

Chris Withers wrote:
> Hey All,
> 
> I hope I have the right list, if not please point me in the right 
> direction...
> 
> Likewise, if there are good docs that cover all of this, please send me 
> their way ;-)
> 
> Right, I'm curious as to how wsgi applications end up being 
> multi-threaded or multi-process and if they are, how they share 
> resources such as databases and configuration.

At least in Pylons apps, configuration is setup during instantiation. 
Configuration is generally copyable (consisting of stuff like strings, 
not open file objects), so it can be cloned across processes easily. 
Things like database connections are handled by libraries that do 
pooling on their own.

> There's a couple of reasons I'm asking...
> 
> The first was something Chris McDonough said about one ofthe issues 
> they're having with the repoze project: when using something like 
> mod_wsgi, it's the first person to hit each thread that takes the hit of 
> loading the configuration and opening up the zodb. Opening the ZODB, in 
> particular, can take a lot of time. How should repoze be structured such 
> that all the threads load their config and open their databases when 
> apache is restarted rather than when each thread is first hit?
> 
> The second is a problem I see an app I'm working on heading towards. The 
> app has web-alterable configuration, so in a multi-threaded and 
> particular multi-process environment, I need some way to get the other 
> threads or processes to re-read their configuration when it has changed.

In Paste/Pylons the configuration is stored in the environment (which is 
per-request), and put into a threadlocal object for access.  Also, in 
general using the Paste Deploy style of factory for WSGI applications, 
*if* the factory is sufficiently fast you can dynamically or lazily 
instantiate applications.  E.g.:

def make_dynamic_configurable_application(
     global_conf, subapp_ep_name, config_source,
     **config_source_args):
     if subapp_ep_name.startswith('egg:'):
         subapp_ep_name = subapp_ep_name[4:]
     if '#' in subapp_ep_name:
         dist, ep_name = subapp_ep_name.split('#', 1)
     else:
         dist = subapp_ep_name
         ep_name = 'main'
     app_factory = pkg_resources.load_entry_point(
         'paste.app_factory', dist, ep_name)
     # You might want to do something similar with config_source
     # for now we'll just imagine its a function that returns a
     # dictionary
     global_conf = global_conf.copy()
     global_conf['config_source'] = config_source
     app_cache = {}
     def application(environ, start_response):
         config = config_source(environ, **config_source_args)
         config_key = sorted(config.items())
         if config_key not in app_cache:
             # Probably should do some locking here...
             app = app_factory(global_conf, **config)
             app_cache[config_key] = app
         else:
             app = app_cache[config_key]
         return app(environ, start_response)
     return application

This all builds off what Paste Deploy already provides, and would allow 
you to apply dynamic configuration to any Paste Deploy-compatible 
application, if that application can also safely handle multiple loaded 
instances/configurations.  Pylons applications work fine this way.

Also note that the configuration loader itself is configured using the 
Paste Deploy interfaces.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org