[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Thu Jun 16 10:16:16 EDT 2011

On 06/15/2011 07:25 PM, Sturla Molden wrote:
> Den 15.06.2011 23:22, skrev Christopher Barker:
>> I think the issue got confused -- the OP was not looking to speed up a
>> matrix multiply, but rather to speed up a whole bunch of independent
>> matrix multiplies.
> I would do it like this:
>
> 1. Write a Fortran function that make multiple calls DGEMM in a do loop.
> (Or Fortran intrinsics dot_product or matmul.)
>
> 2. Put an OpenMP pragma around the loop  (!$omp parallel do). Invoke the
> OpenMP compiler on compilation. Use static or guided thread scheduling.
>
> 3. Call Fortran from Python using f2py, ctypes or Cython.
>
> Build with a thread-safe and single-threaded BLAS library.
>
> That should run as fast as it gets.
>
> Sturla
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

The idea was to calculate:
innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in 
general, multidimensional.

Numpy has an inner product  function called np.inner 
(http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a 
special case of tensordot. See the documentation for what is does and 
other examples. Also note that np.inner provides of the cross-products 
as well. For example,  you can just do:

 >>> import numpy as np
 >>> a=np.arange(10).reshape(2,5)
 >>> a
array([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
 >>> b=a*10
 >>> b
array([[ 0, 10, 20, 30, 40],
        [50, 60, 70, 80, 90]])
 >>> c=a*b
 >>> c
array([[  0,  10,  40,  90, 160],
        [250, 360, 490, 640, 810]])
 >>> c.sum()
2850
 >>>d=np.inner(a,b)
 >>>d
array([[ 300, 800],
  [ 800, 2550]])
 >>>np.diag(d).sum()
2850

I do not know if numpy's multiplication, np.inner() and np.sum() are 
single-threaded and thread-safe when you link to multi-threaded blas, 
lapack or altas libraries. But if everything is *single-threaded* and 
thread-safe, then you just create a function and use Anne's very useful 
handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise 
you would have to determine the best approach such doing nothing 
(especially if the arrays are huge), or making the functions single-thread.

By the way, if the arrays are sufficiently small, there is a lot of 
overhead involved such that there is more time in communication than 
computation. So you have to determine the best algorithm for you case.

Bruce