threading support in python
sjdevnull at yahoo.com
sjdevnull at yahoo.com
Tue Sep 5 11:23:45 EDT 2006
Sandra-24 wrote:
> > You seem to be confused about the nature of multiple-process
> > programming.
> >
> > If you're on a modern Unix/Linux platform and you have static read-only
> > data, you can just read it in before forking and it'll be shared
> > between the processes..
>
> Not familiar with *nix programming, but I'll take your word on it.
You can do the same on Windows if you use CreateProcessEx to create the
new processes and pass a NULL SectionHandle. I don't think this helps
in your case, but I was correcting your impression that "you'd have to
physically double the computer's memory for a dual core, or quadruple
it for a quadcore". That's just not even near true.
> > Threads are way overused in modern multiexecution programming. The
>
> <snip>
>
> > It used to run on windows with multiple processes. If it really won't
> > now, use an older version or contribute a fix.
>
> First of all I'm not in control of spawning processes or threads.
> Apache does that, and apache has no MPM for windows that uses more than
> 1 process.
As I said, Apache used to run on Windows with multiple processes; using
a version that supports that is one option. There are good reasons not
to do that, though, so you could be stuck with threads.
> Secondly "Superior" is definately a matter of opinion. Let's
> see how you would define superior.
Having memory protection is superior to not having it--OS designers
spent years implementing it, why would you toss out a fair chunk of it?
Being explicit about what you're sharing is generally better than not.
But as I said, threads are a better solution if you're sharing the vast
majority of your memory and have complex data structures to share.
When you're starting a new project, really think about whether they're
worth the considerable tradeoffs, though, and consider the merits of a
multiprocess solution.
> 3) Rewrite my codebase to use some form of shared memory. This would be
> a terrible nightmare that would take at least a month of development
> time and a lot of heavy rewriting. It would be very difficult, but I'll
> grant that it may work if done properly with only small performance
> losses.
It's almost certainly not worth rewriting a large established
codebasen.
> I would find an easier time, I think, porting mod_python to .net and
> leaving that GIL behind forever. Thankfully, I'm not considering such
> drastic measures - yet.
The threads vs. processes thing isn't strongly related to the
implementation language (though a few languages like Java basically
take the decision out of your hands). Moving to .NET leaves you with
the same questions to consider before making the decision--just working
in C# doesn't somehow make threads the right choice all the time.
> Why on earth would I want to do all of that work? Just because you want
> to keep this evil thing called a GIL?
No, I agreed that the GIL is a bad thing for some applications.
> My suggestion is in python 3
> ditch the ref counting, use a real garbage collector
I disagree with this, though. The benefits of deterministic GC are
huge and I'd like to see ref-counting semantics as part of the language
definition. That's a debate I just had in another thread, though, and
don't want to repeat.
> > Now, the GIL is independent of this; if you really need threading in
> > your situation (you share almost everything and have hugely complex
> > data structures that are difficult to maintain in shm) then you're
> > still going to run into GIL serialization. If you're doing a lot of
> > work in native code extensions this may not actually be a big
> > performance hit, if not it can be pretty bad.
>
> Actually, I'm not sure I understand you correctly. You're saying that
> in an environment like apache (with 250 threads or so) and my hugely
> complex shared data structures, that the GIL is going to cause a huge
> performance hit?
I didn't say that. It can be a big hit or it can be unnoticeable. It
depends on your application. You have to benchmark to know for sure.
But if you're trying to make a guess: if you're doing a lot of heavy
lifting in native modules then the GIL may be released during those
calls, and you might get good multithreading performance. If you're
doing lots of I/O requests the GIL is generally released during those
and things will be fine. If you're doing lots of heavy crunching in
Python, the GIL is probably held and can be a big performance issue.
Since your app sounds like it's basically written, there's not much
cause to guess; benchmark it and see if it's fast enough or not. If
so, don't spend time and effort optimizing.
More information about the Python-list
mailing list