threading support in python

sjdevnull at yahoo.com sjdevnull at yahoo.com
Tue Sep 5 11:23:45 EDT 2006


Sandra-24 wrote:
> > You seem to be confused about the nature of multiple-process
> > programming.
> >
> > If you're on a modern Unix/Linux platform and you have static read-only
> > data, you can just read it in before forking and it'll be shared
> > between the processes..
>
> Not familiar with *nix programming, but I'll take your word on it.

You can do the same on Windows if you use CreateProcessEx to create the
new processes and pass a NULL SectionHandle.  I don't think this helps
in your case, but I was correcting your impression that "you'd have to
physically double the computer's memory for a dual core, or quadruple
it for a quadcore".  That's just not even near true.

> > Threads are way overused in modern multiexecution programming.  The
>
> <snip>
>
> > It used to run on windows with multiple processes.  If it really won't
> > now, use an older version or contribute a fix.
>
> First of all I'm not in control of spawning processes or threads.
> Apache does that, and apache has no MPM for windows that uses more than
> 1 process.

As I said, Apache used to run on Windows with multiple processes; using
a version that supports that is one option.  There are good reasons not
to do that, though, so you could be stuck with threads.

> Secondly "Superior" is definately a matter of opinion. Let's
> see how you would define superior.

Having memory protection is superior to not having it--OS designers
spent years implementing it, why would you toss out a fair chunk of it?
 Being explicit about what you're sharing is generally better than not.


But as I said, threads are a better solution if you're sharing the vast
majority of your memory and have complex data structures to share.
When you're starting a new project, really think about whether they're
worth the considerable tradeoffs, though, and consider the merits of a
multiprocess solution.

> 3) Rewrite my codebase to use some form of shared memory. This would be
> a terrible nightmare that would take at least a month of development
> time and a lot of heavy rewriting. It would be very difficult, but I'll
> grant that it may work if done properly with only small performance
> losses.

It's almost certainly not worth rewriting a large established
codebasen.


> I would find an easier time, I think, porting mod_python to .net and
> leaving that GIL behind forever. Thankfully, I'm not considering such
> drastic measures - yet.

The threads vs. processes thing isn't strongly related to the
implementation language (though a few languages like Java basically
take the decision out of your hands). Moving to .NET leaves you with
the same questions to consider before making the decision--just working
in C# doesn't somehow make threads the right choice all the time.

> Why on earth would I want to do all of that work? Just because you want
> to keep this evil thing called a GIL?

No, I agreed that the GIL is a bad thing for some applications.

> My suggestion is in python 3
> ditch the ref counting, use a real garbage collector

I disagree with this, though.  The benefits of deterministic GC are
huge and I'd like to see ref-counting semantics as part of the language
definition.  That's a debate I just had in another thread, though, and
don't want to repeat.

> > Now, the GIL is independent of this; if you really need threading in
> > your situation (you share almost everything and have hugely complex
> > data structures that are difficult to maintain in shm) then you're
> > still going to run into GIL serialization.  If you're doing a lot of
> > work in native code extensions this may not actually be a big
> > performance hit, if not it can be pretty bad.
>
> Actually, I'm not sure I understand you correctly. You're saying that
> in an environment like apache (with 250 threads or so) and my hugely
> complex shared data structures, that the GIL is going to cause a huge
> performance hit?

I didn't say that.  It can be a big hit or it can be unnoticeable.  It
depends on your application.  You have to benchmark to know for sure.

But if you're trying to make a guess: if you're doing a lot of heavy
lifting in native modules then the GIL may be released during those
calls, and you might get good multithreading performance.  If you're
doing lots of I/O requests the GIL is generally released during those
and things will be fine.  If you're doing lots of heavy crunching in
Python, the GIL is probably held and can be a big performance issue.

Since your app sounds like it's basically written, there's not much
cause to guess; benchmark it and see if it's fast enough or not.  If
so, don't spend time and effort optimizing.




More information about the Python-list mailing list