[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Thu Jun 16 15:19:02 EDT 2011

The fact that you are still passing the myutil.arrayList argument to
pool.map means the data is still being pickled. All arguments to
pool.map are pickled to be passed to the subprocess. You need to
change the function so that it accesses the data directly, then just
pass indices to pool.map.

Cheers

Robin

On Thu, Jun 16, 2011 at 9:05 PM, Brandt Belson <bbelson at princeton.edu> wrote:
> Hi all,
> Thanks for the replies. As mentioned, I'm parallelizing so that I can take
> many inner products simultaneously (which I agree is embarrassingly
> parallel). The library I'm writing asks the user to supply a function that
> takes two objects and returns their inner product. After all the discussion
> though it seems this is too simplistic of an approach. Instead, I plan to
> write this part of the library as if the inner product function supplied by
> the user uses all available cores (with numpy and/or numexpr built with MKL
> or LAPACK).
> As far as using fortran or C and openMP, this probably isn't worth the time
> it would take, both for me and the user.
> I've tried increasing the array sizes and found the same trends, so the
> slowdown isn't only because the arrays are too small to see the benefit of
> multiprocessing. I wrote the code to be easy for anyone to experiment with,
> so feel free to play around with what is included in the profiling, the
> sizes of arrays, functions used, etc.
> I also tried using handythread.foreach with arraySize = (3000,1000), and
> found the following:
> No shared memory, numpy array multiplication took 1.57585811615 seconds
> Shared memory, numpy array multiplication took 1.25499510765 seconds
> This is definitely an improvement from multiprocessing, but without knowing
> any better, I was hoping to see a roughly 8x speedup on my 8-core
> workstation.
> Based on what Chris sent, it seems there is some large overhead caused by
> multiprocessing pickling numpy arrays. To test what Robin mentioned
>> If you are on Linux or Mac then fork works nicely so you have read
>> only shared memory you just have to put it in a module before the fork
>> (so before pool = Pool() ) and then all the subprocesses can access it
>> without any pickling required. ie
>> myutil.data = listofdata
>> p = multiprocessing.Pool(8)
>> def mymapfunc(i):
>>   return mydatafunc(myutil.data[i])
>>
>> p.map(mymapfunc, range(len(myutil.data)))
> I tried creating the arrayList in the myutil module and using
> multiprocessing to find the inner products of myutil.arrayList, however this
> was still slower than not using multiprocessing, so I believe there is still
> some large overhead. Here are the results:
> No shared memory, numpy array multiplication took 1.55906510353 seconds
> Shared memory, numpy array multiplication took 9.82426381111 seconds
> Shared memory, myutil.arrayList numpy array multiplication took
> 8.77094507217 seconds
> I'm attaching this code.
> I'm going to work around this numpy/multiprocessing behavior with
> numpy/numexpr built with MKL or LAPACK. It would be good to know exactly
> what's causing this though. It would be nice if there was a way to get the
> ideal speedup via multiprocessing, regardless of the internal workings of
> the single-threaded inner product function, as this was the behavior I
> expected. I imagine other people might come across similar situations, but
> again I'm going to try to get around this by letting MKL or LAPACK make use
> of all available cores.
> Thanks again,
> Brandt
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>