multiprocessing speedup

Tue Sep 29 03:24:26 EDT 2015

Rita wrote:

> I am using the multiprocessing with apply_async to do some work. Each task
> takes a few seconds but I have several thousand tasks. I was wondering if
> there is a more efficient method and especially when I plan to operate on
> a
>  large memory arrays (numpy)
> 
> Here is what I have now
> 
> 
> import multiprocessing as mp
> import random
> 
> def f(x):
>     count=0
>     for i in range(x):
>         x=random.random()
>         y=random.random()
>         if x*x + y*y<=1:
>             count+=1
> 
>     return count
> 
> def main():
>     resultObj=[]
>     n=10000
>     P=mp.Pool(2)
>     for arg in xrange(n):
>         resultObj.append(P.apply_async(f,(arg,)))
>     P.close()
>     P.join()
>     result = [ i.get() for i in resultObj ]
>     print sum(result)/(n)
> 
> if __name__=="__main__":
>     main()

This is much too early to worry about speed. 
First write a working version. 
Then measure to identify the bottlenecks.
Then optimise the bottlenecks (if any) and only the bottlenecks.

> 1) Does multiprocessing do a fork for each task?

I don't think so. You can see the when you modify your script to return the 
process id (there will only be two).

But the data has to be passed around.

> 2) If so, I assume thats costly due to setup and teardown. Would this be
> the case?

I don't think so.

> 3) I plan to pass large arrays to function,f, therefore is there a more
> efficient method to achieve this?

Where do these arrays come from? If they are coming from a file could you 
use separate scripts to operate on part(s) of the file(s)? 

Of course such considerations are moot if most of time is spent to process 
an array rather than pass it around. Which brings us back to the first and 
foremost point:

This is much too early to worry about speed.