Python does not play well with others

Mon Feb 5 16:39:22 EST 2007

On Feb 6, 4:52 am, John Nagle <n... at animats.com> wrote:
> sjdevn... at yahoo.com wrote:
> > John Nagle wrote:
>
> >>Graham Dumpleton wrote:
>
> >>>On Feb 4, 1:05 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
> >>>>"Paul Boddie" <p... at boddie.org.uk> writes:
> >>     Realistically, mod_python is a dead end for large servers,
> >>because Python isn't really multi-threaded.  The Global Python
> >>Lock means that a multi-core CPU won't help performance.
>
> > The GIL doesn't affect seperate processes, and any large server that
> > cares about stability is going to be running a pre-forking MPM no
> > matter what language they're supporting.
>
>    Pre-forking doesn't reduce load; it just improves responsiveness.
> You still pay for loading all the modules on every request.  For
> many AJAX apps, the loading cost tends to dominate the transaction.
>
>    FastCGI, though, reuses the loaded Python (or whatever) image.
> The code has to be written to be able to process transactions
> in sequence (i.e. don't rely on variables intitialized at load),
> but each process lives for more than one transaction cycle.
> However, each process has a copy of the whole Python system,
> although maybe some code gets shared.

As someone else pointed out, your understanding of how mod_python
works within Apache is somewhat wrong. I'll explain some things a bit
further to make it clearer for you.

When the main Apache process (parent) is started it will load all the
various Apache modules including that for mod_python. Each of these
modules has the opportunity to hook into various configuration phases
to perform actions. In the case of mod_python it will hook into the
post config phase and initialise Python which will in turn setup all
the builtin Python modules.

When Apache forks off child processes each of those child processes
will inherit Python already in an initialised state and also the
initial Python interpreter instance which was created, This therefore
avoids the need to perform initialisation of Python every time that a
child process is created.

In general this initial Python interpreter instance isn't actually
used though, as the default strategy of mod_python is to allocate
distinct Python interpreter instances for each VirtualHost, thereby at
least keeping applications running in distinct VirtualHost containers
to be separate so they don't interfere with each other.

Yes, these per VirtualHost interpreter instances will only be created
on demand in the child process when a request arrives which
necessitates it be created and so there is some first time setup for
that specific interpreter instance at that point, but the main Python
initialisation has already occurred so this is minor. Most
importantly, once that interpreter instance is created for the
specific VirtualHost in the child process it is retained in memory and
used from one request to the next. If the handler for a request loads
in Python modules, those Python modules are retained in memory and do
not have to be reloaded on each request as you believe.

If you are concerned about the fact that you don't specifically know
when an interpreter instance will be first created in the child
process, ie., because it would only be created upon the first request
arriving that actually required it, you can force interpreter
instances to be created as soon as the child process has been created
by using the PythonImport directive. What this directive allows you to
do is specify a Python module that should be preloaded into a specific
interpreter instance as soon as the child process is created. Because
the interpreter will need to exist, it will first be created before
the module is loaded thereby achieving the effect of precreating the
specific named interpreter instance.

So as to make sure you don't think that that first interpreter
instance created in the parent and inherited by the child processes is
completely wasted, it should be pointed out that the first interpreter
instance created by Python is sort of special. In general it shouldn't
matter, but there is one case where it does. This is where a third
party extension module for Python has not been written so as to work
properly in a context where there are multiple sub interpreters.
Specifically, if a third party extension module used the simplified
API for GIL locking one can have problems using that module in
anything but the first interpreter instance created by Python. Thus,
the first instance is retained and in some cases it may be necessary
to force your application to run within the context of that
interpreter instance to get it to work where using such a module. If
you have to do this for multiple applications running under different
VirtualHost containers you loose your separation though, thus this is
only provided as a fallback when you don't have a choice.

I'll mention one other area in case you have the wrong idea about it
as well. In mod_python there is a feature for certain Python modules
to be reloaded. This feature is normally on by default but is always
recommended to be turned off in a production environment. To make it
quite clear, this feature does not mean that the modules which are
candidates for reloading will be reloaded on every request. Such
modules will only be reloaded if the code file for that module has
been changed. Ie., its modification time on disk has been changed. In
mod_python 3.3 where this feature is a bit more thorough and robust,
it will also reload a candidate module if some child or descendant of
the module has been changed.

So to summarise. Interpreter instances once created in the child
processes for a particular context are retained in memory and used
from one request to the next. Further, any modules loaded by code for
a request handler is retained in memory and do not have to be reloaded
on each request. Even when module reloading is enabled in mod_python,
modules are only reloaded where a code file associated with that
module has been changed on disk.

Does that clarify any misunderstandings you have?

So far it looks like the only problem that has been identified is one
that I already know about, which is that there isn't any good
documentation out there which describes how it all works. As a result
there are a lot of people out there who describe wrongly how they
think it works and thus give others the wrong impression. I already
knew this, as I quite often find myself having to correct statements
on various newgroups and in documentation for various Python web
frameworks. What is annoying is that even though you point out to some
of the Python web frameworks that what they state in their
documentation is wrong or misleading they don't correct it. Thus the
wrong information persists and keeps spreading the myth that there
must be some sort of problem where there isn't really. :-(

Graham