[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Bruce Southey bsouthey at gmail.com
Thu Jun 16 10:16:16 EDT 2011


On 06/15/2011 07:25 PM, Sturla Molden wrote:
> Den 15.06.2011 23:22, skrev Christopher Barker:
>> I think the issue got confused -- the OP was not looking to speed up a
>> matrix multiply, but rather to speed up a whole bunch of independent
>> matrix multiplies.
> I would do it like this:
>
> 1. Write a Fortran function that make multiple calls DGEMM in a do loop.
> (Or Fortran intrinsics dot_product or matmul.)
>
> 2. Put an OpenMP pragma around the loop  (!$omp parallel do). Invoke the
> OpenMP compiler on compilation. Use static or guided thread scheduling.
>
> 3. Call Fortran from Python using f2py, ctypes or Cython.
>
> Build with a thread-safe and single-threaded BLAS library.
>
> That should run as fast as it gets.
>
> Sturla
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

The idea was to calculate:
innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in 
general, multidimensional.

Numpy has an inner product  function called np.inner 
(http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a 
special case of tensordot. See the documentation for what is does and 
other examples. Also note that np.inner provides of the cross-products 
as well. For example,  you can just do:

 >>> import numpy as np
 >>> a=np.arange(10).reshape(2,5)
 >>> a
array([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
 >>> b=a*10
 >>> b
array([[ 0, 10, 20, 30, 40],
        [50, 60, 70, 80, 90]])
 >>> c=a*b
 >>> c
array([[  0,  10,  40,  90, 160],
        [250, 360, 490, 640, 810]])
 >>> c.sum()
2850
 >>>d=np.inner(a,b)
 >>>d
array([[ 300, 800],
  [ 800, 2550]])
 >>>np.diag(d).sum()
2850


I do not know if numpy's multiplication, np.inner() and np.sum() are 
single-threaded and thread-safe when you link to multi-threaded blas, 
lapack or altas libraries. But if everything is *single-threaded* and 
thread-safe, then you just create a function and use Anne's very useful 
handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise 
you would have to determine the best approach such doing nothing 
(especially if the arrays are huge), or making the functions single-thread.

By the way, if the arrays are sufficiently small, there is a lot of 
overhead involved such that there is more time in communication than 
computation. So you have to determine the best algorithm for you case.

Bruce




More information about the NumPy-Discussion mailing list