[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Mon Jun 13 13:51:08 EDT 2011

Looking at the code the arrays that you are multiplying seem fairly
small (300, 200) and you have 50 of them. So it might the case that
there is not enough computational work to compensate for the cost of
forking new processes and communicating the results. Have you tried
larger arrays and more of them ?

If you are on an intel machine and you have MKL libraries around I
would strongly recommend that you use the matrix multiplication
routine if possible. MKL will do the parallelization for you. Well,
any good BLAS implementation would do the same, you dont really need
MKL. ATLAS and ACML would work too, just that MKL has been setup for
us and it works well.

To give an idea, given the amount of tuning and optimization that
these libraries have undergone a numpy.sum would be slower that an
multiplication with a vector of all ones. So in the interest of speed
the longer you stay in the BLAS context the better.

--srean

On Fri, Jun 10, 2011 at 10:01 AM, Brandt Belson <bbelson at princeton.edu> wrote:
> Unfortunately I can't flatten the arrays. I'm writing a library where the
> user supplies an inner product function for two generic objects, and almost
> always the inner product function does large array multiplications at some
> point. The library doesn't get to know about the underlying arrays.
> Thanks,
> Brandt