threading support in python

Tue Sep 5 15:13:47 EDT 2006

sjdevnull at yahoo.com wrote:
> You can do the same on Windows if you use CreateProcessEx to create the
> new processes and pass a NULL SectionHandle.  I don't think this helps
> in your case, but I was correcting your impression that "you'd have to
> physically double the computer's memory for a dual core, or quadruple
> it for a quadcore".  That's just not even near true.

Sorry, my bad. What I meant to say is that for my application I would
have to increase the memory linearly with the number of cores. I have
about 100mb of memory that could be shared between processes, but
everything else would really need to be duplicated.

> As I said, Apache used to run on Windows with multiple processes; using
> a version that supports that is one option.  There are good reasons not
> to do that, though, so you could be stuck with threads.

I'm not sure it has done that since the 1.3 releases. mod_python will
work for that, but involves going way back in it's release history as
well. I really don't feel comfortable with that, and I don't doubt I'd
give up a lot of things I'd miss.

> Having memory protection is superior to not having it--OS designers
> spent years implementing it, why would you toss out a fair chunk of it?
>  Being explicit about what you're sharing is generally better than not.

Actually, I agree. If shared memory will prove easier, then why not use
it, if the application lends itself to that.

> But as I said, threads are a better solution if you're sharing the vast
> majority of your memory and have complex data structures to share.
> When you're starting a new project, really think about whether they're
> worth the considerable tradeoffs, though, and consider the merits of a
> multiprocess solution.

There are merits, the GIL being one of those. I believe I can fairly
easily rework things into a multi-process environment by duplicating
memory. Over time I can make the memory usage more efficient by sharing
some data structures out, but that may not even be necessary. The
biggest problem is learning my way around Linux servers. I don't think
I'll choose that option initially, but I may work on it as a project in
the future. It's about time I got more familiar with Linux anyway.

> It's almost certainly not worth rewriting a large established
> codebase.

Lazy me is in perfect agreement.

> I disagree with this, though.  The benefits of deterministic GC are
> huge and I'd like to see ref-counting semantics as part of the language
> definition.  That's a debate I just had in another thread, though, and
> don't want to repeat.

I just took it for granted that a GC like Java and .NET use is better.
I'll dig up that thread and have a look at it.

> I didn't say that.  It can be a big hit or it can be unnoticeable.  It
> depends on your application.  You have to benchmark to know for sure.
>
> But if you're trying to make a guess: if you're doing a lot of heavy
> lifting in native modules then the GIL may be released during those
> calls, and you might get good multithreading performance.  If you're
> doing lots of I/O requests the GIL is generally released during those
> and things will be fine.  If you're doing lots of heavy crunching in
> Python, the GIL is probably held and can be a big performance issue.

I don't do a lot of work in native modules, other than the standard
library things I use, which doesn't count as heavy lifting. However I
do a fair amount of database calls, and either the GIL is released by
MySQLdb, or I'll contribute a patch so that it is. At any rate, I will
measure, and I suspect the GIL will not be an issue.

-Sandra