Python does not play well with others

Tue Feb 6 16:19:21 EST 2007

On Feb 5, 5:45 pm, "Graham Dumpleton" <grah... at dscpl.com.au> wrote:
> On Feb 6, 8:57 am, "sjdevn... at yahoo.com" <sjdevn... at yahoo.com> wrote:
>
>
>
> > On Feb 5, 12:52 pm, John Nagle <n... at animats.com> wrote:
>
> > > sjdevn... at yahoo.com wrote:
> > > > John Nagle wrote:
>
> > > >>Graham Dumpleton wrote:
>
> > > >>>On Feb 4, 1:05 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
> > > >>>>"Paul Boddie" <p... at boddie.org.uk> writes:
> > > >>     Realistically,mod_pythonis a dead end for large servers,
> > > >>because Python isn't really multi-threaded.  The Global Python
> > > >>Lock means that a multi-core CPU won't help performance.
>
> > > > The GIL doesn't affect seperate processes, and any large server that
> > > > cares about stability is going to be running a pre-forking MPM no
> > > > matter what language they're supporting.
>
> > >    Pre-forking doesn't reduce load; it just improves responsiveness.
> > > You still pay for loading all the modules on every request.
>
> > No, you don't.  Each server is persistent and serves many requests--
> > it's not at all like CGI, and it reuses the loaded Python image.
>
> > So if you have, say, an expensive to load Python module, that will
> > only be executed once for each server you start...e.g. if you have
> > Apache configured to accept up to 50 connections, the module will be
> > run at most 50 times; once each of the 50 processes has started up,
> > they stick around until you restart Apache, unless you've configured
> > apache to only serve X requests in one process before restarting it.
> > (The one major feature thatmod_python_is_ missing is the ability to
> > do some setup in the Python module prior to forking.  That would make
> > restarting Apache somewhat nicer).
>
> There would be a few issues with preloading modules before the main
> Apache child process performed the fork.
>
> The first is whether it would be possible for code to be run with
> elevated privileges given that the main Apache process usually is
> started as root. I'm not sure at what point it switches to the special
> user Apache generally runs as and whether in the main process the way
> this switch is done is enough to prevent code getting back root
> privileges in some way, so would need to be looked into.

In our case, the issue is this: we load a ton of info at server
restart, from the database.  Some of it gets processed a bit based on
configuration files and so forth.  If this were done in my own C
server, I'd do all of that and set up the (read-only) runtime data
structures prior to forking.  That would mean that:
a) The processing time would be lower since you're just doing the pre-
processing once; and
b) The memory footprint could be lower if large data structures were
created prior to fork; they'd be in shared copy-on-write pages.

b) isn't really possible in Python as far as I can tell (you're going
to wind up touching the reference counts when you get pointers to
objects in the page, so everything's going to get copied into your
process eventually), but a) would be very nice to have.

> The second issue is that there can be multiple Python interpreters
> ultimately created depending on how URLs are mapped, thus it isn't
> just an issue with loading a module once, you would need to create all
> the interpreters you think might need it and preload it into each. All
> this will blow out the memory size of the main Apache process.

It'll blow out the children, too, though.  Most real-world
implementations I've seen just use one interpreter, so even a solution
that didn't account for this would be very useful in practice.

> There is also much more possibility for code, if it runs up extra
> threads, to interfere with the operation of the Apache parent process.

Yeah, you don't want to run threads in the parent (I'm not sure many
big mission-critical sites use multiple threads anyway, certainly none
of the 3 places I've worked at did).  You don't want to allow
untrusted code.  You have to be careful, and you should treat anything
run there as part of the server configuration.

But it would still be mighty nice.  We're considering migrating to
another platform (still Python-based) because of this issue, but
that's only because we've gotten big enough (in terms of "many big fat
servers sucking up CPU on one machine", not "tons of traffic") that
it's finally an issue.  mod_python is still very nice and frankly if
our startup coding was a little less piggish it might not be an issue
even now--on the other hand, we've gotten a lot of flexibility out of
our approach, and the code base is up to 325,000 lines of python or
so.  We might be able to refactor things to cut down on startup costs,
but in general a way to call startup code only once seems like the
Right Thing(TM).