[SciPy-user] Python on Intel Xeon Dual Core Machine

Tue Feb 5 23:09:07 EST 2008

On 05/02/2008, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:

> And thanks everybody for the many replies.
> I partially solved the problem adding some extra RAM memory.
> A rather primitive solution, but now my desktop does not use any swap memory and the code runs faster.
> Unfortunately, the nature of the code does not easily lend itself to being split up into easier tasks.
> However, apart from the parallel python homepage, what is your recommendation for a beginner who wants a smattering in parallel computing (I have in mind C and Python at the moment)?

Really the first thing to do is figure out what's actually taking the
time in your program.  The python profiler has its limitations, but
it's still worth using. Even just "print time.time()" can make a
difference. If memory is a problem - as it was in your case - and
you're swapping to disk, parallelizing your code may make things run
slower. (Swapping is, as you probably noticed, *incredibly* slow, so
anything that makes you do more of it, like trying to cram more stuff
in memory at once, is going to make things much slower.) Even if
you're already pretty sure you know which parts are slow,
instrumenting it will tell you how much difference the various
parallelization tricks you try are making.

What kind of parallelizing you should do really depends on what's slow
in your program, and on what you can change. At a conceptual idea,
some operations parallelize easily and others require much thought.
For example, if you're doing something ten times, and each time is
independent of the others, that can be easily parallelized (that's
what my little script handythread does). If you're doing something
more complicated - sorting a list, say - that requires complicated
sequencing, parallelizing it is going to be hard.

Start by thinking about the time-consuming tasks you identified above.
Does each task depend on the result of a previous task? If not, you
can run them concurrently, using something like handythread, python's
threading module, or parallel python.

If they do depend on each other, start looking at each time-consuming
task in turn. Could it be parallelized? This can mean one of two
things: you could write code to make the task run in parallel, or you
could make python use something like a parallelized linear-algebra
library that automatically parallelizes (say) matrix multiplication
(this is what the people who suggest MKL are suggesting). More
generally, could the task be made to run faster in other ways? If
you're reading text files, could you read binaries? If you're calling
an external program thousands of times, could you use python or call
it only once with more input?

Parallel programming is a massive, complicated field, and many
high-powered software tools exist to take advantage of it.
Unfortunately, python has a limitation in this area: the Global
Interpreter Lock. Basically it means no two CPUs can be running python
code at the same time. This means that you get no speedup at all by
parallelizing your python code - with a few important exceptions:
while one thread is doing an array operation, other threads can run
python code, and while one thread is waiting for I/O (reading from
disk, for example), other threads can run python code. Parallel python
is a toolkit that can avoid this problem by running multiple python
interpreters (though I have little experience with it).

Generally, parallelization works best when you don't need to move much
data around. The fact that you're running short of memory suggests
that you are doing that. Parallelization also always requires some
restructuring of your code, and more if you want to be more efficient.

Anne