[SciPy-User] Poor scalability of embarrassingly parallel code with multiprocessing

Sat Oct 24 13:55:21 EDT 2009

la, 2009-10-24 kello 19:01 +0530, Rohit Garg kirjoitti:
[clip]
> I am attaching a very simple, embarrassingly parallel code. The
> problem is that it shows practically no speedup at all on my dual-core
> machine.
[clip]
> ===============================
> #test of Pool's map capablities
> from multiprocessing import Pool
> import numpy
> import sys
> 
> procs=int(sys.argv[1])
> print procs
> def f(x):
>     index,probs=x
>     return index,2.0*probs
> 
> prob_samples=1000000
> 
> probX=numpy.linspace(0.2, 0.3, prob_samples)
> 
> Input=[(i,probX[i]) for i in xrange(prob_samples) ]
> 
> pool = Pool(processes=procs)
> 
> pool.map(f, Input)

The scaling problem you have is that:

1) For each communication event, there is some overhead.

   You invoke `prob_samples` communication events, which incurs
   the communication overhead 1000000 times.

   This is where your code spends most of its time.

2) Your computational sub-problem is limited by memory bus speed:
   most of the time is taken by transfer of data between
   the main memory and CPU caches.

In general, if you have this large amount of data per CPU, you can
suppress overhead costs by communicating the samples in, say, 1000
element blocks.

But because of 2), your problem is essentially unscalable. The CPU <->
main memory communication is a bottleneck that you cannot work around.

Here's a better example function:

def f(x):
    index,probs=x
    for k in xrange(1000):
        numpy.cos(probs, probs)
    return index, probs

-- 
Pauli Virtanen