[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Fri Jun 17 08:08:36 EDT 2011

I didn't have time yesterday but the attached illustrates what I mean
about putting the shared data in a module (it should work with the
previous myutil).

I don't get a big speed up but at least it is faster using multiple
subprocesses:

Not threaded:  0.450406074524
Using 8 processes: 0.282383

If I add a 1s sleep in the inner product function shows a much more
significant improvement (ie if it were a longer computation):

Not threaded:  50.6744170189
Using 8 processes: 8.152393

Still not quite linear but certainly an improvement.

Cheers

Robin

On Thu, Jun 16, 2011 at 9:19 PM, Robin <robince at gmail.com> wrote:
> The fact that you are still passing the myutil.arrayList argument to
> pool.map means the data is still being pickled. All arguments to
> pool.map are pickled to be passed to the subprocess. You need to
> change the function so that it accesses the data directly, then just
> pass indices to pool.map.
>
> Cheers
>
> Robin
>
> On Thu, Jun 16, 2011 at 9:05 PM, Brandt Belson <bbelson at princeton.edu> wrote:
>> Hi all,
>> Thanks for the replies. As mentioned, I'm parallelizing so that I can take
>> many inner products simultaneously (which I agree is embarrassingly
>> parallel). The library I'm writing asks the user to supply a function that
>> takes two objects and returns their inner product. After all the discussion
>> though it seems this is too simplistic of an approach. Instead, I plan to
>> write this part of the library as if the inner product function supplied by
>> the user uses all available cores (with numpy and/or numexpr built with MKL
>> or LAPACK).
>> As far as using fortran or C and openMP, this probably isn't worth the time
>> it would take, both for me and the user.
>> I've tried increasing the array sizes and found the same trends, so the
>> slowdown isn't only because the arrays are too small to see the benefit of
>> multiprocessing. I wrote the code to be easy for anyone to experiment with,
>> so feel free to play around with what is included in the profiling, the
>> sizes of arrays, functions used, etc.
>> I also tried using handythread.foreach with arraySize = (3000,1000), and
>> found the following:
>> No shared memory, numpy array multiplication took 1.57585811615 seconds
>> Shared memory, numpy array multiplication took 1.25499510765 seconds
>> This is definitely an improvement from multiprocessing, but without knowing
>> any better, I was hoping to see a roughly 8x speedup on my 8-core
>> workstation.
>> Based on what Chris sent, it seems there is some large overhead caused by
>> multiprocessing pickling numpy arrays. To test what Robin mentioned
>>> If you are on Linux or Mac then fork works nicely so you have read
>>> only shared memory you just have to put it in a module before the fork
>>> (so before pool = Pool() ) and then all the subprocesses can access it
>>> without any pickling required. ie
>>> myutil.data = listofdata
>>> p = multiprocessing.Pool(8)
>>> def mymapfunc(i):
>>>   return mydatafunc(myutil.data[i])
>>>
>>> p.map(mymapfunc, range(len(myutil.data)))
>> I tried creating the arrayList in the myutil module and using
>> multiprocessing to find the inner products of myutil.arrayList, however this
>> was still slower than not using multiprocessing, so I believe there is still
>> some large overhead. Here are the results:
>> No shared memory, numpy array multiplication took 1.55906510353 seconds
>> Shared memory, numpy array multiplication took 9.82426381111 seconds
>> Shared memory, myutil.arrayList numpy array multiplication took
>> 8.77094507217 seconds
>> I'm attaching this code.
>> I'm going to work around this numpy/multiprocessing behavior with
>> numpy/numexpr built with MKL or LAPACK. It would be good to know exactly
>> what's causing this though. It would be nice if there was a way to get the
>> ideal speedup via multiprocessing, regardless of the internal workings of
>> the single-threaded inner product function, as this was the behavior I
>> expected. I imagine other people might come across similar situations, but
>> again I'm going to try to get around this by letting MKL or LAPACK make use
>> of all available cores.
>> Thanks again,
>> Brandt
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poolmap.py
Type: text/x-python
Size: 1076 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110617/ebad6cce/attachment.py>