database connection pooling from mod_python app

Wed Oct 8 16:57:04 EDT 2003

Ian Bicking wrote:

> On Wednesday, October 8, 2003, at 11:23 AM, Anthony_Barker wrote:
>
>> I have been searching around for database connection pooling for a
>> mod_python app.
>
>
> My understanding is that you don't really need connection pooling in 
> mod_python -- you can simply put the connection in a global variable.

In fact, connection pooling within an application hosted by a forking
Apache server is redundant since only one connection will ever be
required at a time. 1 process == 1 request == 1 connection.

>   Maybe with something like:
>
> def get_connection():
>     global _conn
>     try:
>         return _conn
>     except NameError:
>         _conn = (make your connection)
>         return _conn
>
>
> I don't know mod_python enough to know if there's other magic 
> incantations you need.  You don't need "pooling", because each request 
> is served in its own process -- you simply want to reuse the 
> connection for future requests.  If there's 10 processes, you'll need 
> 10 separate connections, but those 10 processes will over time serve 
> many requests so you can use those 10 connections over and over. 

Ian's system will work just fine (although I think a module-scope
variable is nicer than a global) but there are a few things to be aware
of as detailed below. First of all I would like to define a couple of
terms to try to avoid any confusion:

"persistent connection" means a database connection which is opened once
and never closed.

"connection pool" means a managed collection of (possibly) open
connections. A connection pool often has a maximum pool size and may
wait for a period of time (i.e. block) for a connection to be closed
rather than open a new collection if the pool size is exceeded.
Connection objects from a pool are generally wrapped so that close()
returns the connection to the pool rather than actually closing it.

--
1. Reused Transaction vs Connection Pool Leaks

It is critical that a persistent connection's transaction is *always*
completed - either with commit or rollback. If you forget to do this
then database operations during the next request that the process
handles will use the same transaction ... which could be disasterous.

A connection pool will only give out a connection that is "closed" and,
typically, rollback() is called by the pool manager as a connection is
returned. This avoids the problem of transactions spanning multiple
requests. A common problem when using connection pools are connection
leaks - a connection is not closed and hence is never returned to the
pool - but IMHO this is less serious. I would rather see an application
hang waiting for a connection to become available than have corrupt data.

The reused transaction and connection leak problems are both programming
errors (use a finally block to avoid the problem) but it is easier to
spot a connection pool leak - set the maximum pool size to a low number
and hammer the application. You'll soon find out if there are any
problems ;-). Add some logging to the pool implementation to record who
opens and closes connection and you can easily find offending code that
forgets to call close().

2. Apache forking

As Ian mentions, Apache creates multiple processes to handle requests. A
default installation of Apache 1.x on Debian has a MaxClients of 150,
i.e. 150 processes. If each one of those processes maintains a
persistent connection you can easily hit your database server's
connection limit. 150 is a *lot* of connections to keep open anyway but
many requests will not even need database access, i.e. images, CSS,
JavaScript etc.

Perhaps you've tuned your Apache config and set MaxClients to something
more approriate, 30 for instance. Even then, it's likely that the number
of simultaneous connections that are _required_ is much lower than 30,
you might only need 5.

3. Multiple Applications

Now let's say your web server hosts 2 applications and both applications
use a different database. If persistent connections are used you've just
doubled the maximum number of open connections! This scenario applies to
an Apache instance with virtual hosting and I think it's the same when
mod_python's PythonInterpreter configuration directive is used.
--

I'm not saying that persistent connections are inherently bad, just that
there a number of issues to be aware of with Apache. If you are using a
forking Apache setup and you cannot afford to open new database
connections on each request then persistent connections may be your only
choice.

However, if you need connection pooling to allow your application to
scale effectively then you may be better off using an application server
(Zope, Webware, Twisted etc) or something like SCGI
(http://www.mems-exchange.org/software/scgi/). Another possibility,
which I have not yet tried, is SQL Relay (http://sqlrelay.sourceforge.net).

Sorry for the length of this post, I hope it helps.

Cheers, Matt

-- 
Matt Goodall, Pollenation Internet Ltd
w: http://www.pollenation.net
e: matt at pollenation.net