[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Wed Jun 15 13:15:09 EDT 2011

Den 13.06.2011 19:51, skrev srean:
> If you are on an intel machine and you have MKL libraries around I
> would strongly recommend that you use the matrix multiplication
> routine if possible. MKL will do the parallelization for you. Well,
> any good BLAS implementation would do the same, you dont really need
> MKL. ATLAS and ACML would work too, just that MKL has been setup for
> us and it works well.

Never mind ATLAS.  Alternatives to MKL are GotoBLAS2, ACML and ACML-GPU. 
GotoBLAS2 is generally faster than MKL. The relative performance of ACML 
and MKL depends on the architecture, but both are now fast on either 
architecture. ACML-GPU will move matrix multiplication (*GEMM 
subroutines) to the (AMD/ATI) GPU if it can (and the problem is large 
enough).

MKL used to run in tortoise mode on AMD chips, but not any longer due to 
intervention by the Federal Trade Commission.

IMHO, trying to beat Intel or AMD performance library developers with 
Python, NumPy and multiprocessing is just silly. Nothing we do with 
array operator * and np.sum is ever going to compare with BLAS functions 
from these libraries.

Sometimes we need a little bit more course-grained parallelism. Then 
it's time to think about Python threads and releasing the GIL or use 
OpenMP with C or Fortran.

multiprocessing is the last tool to think about. It is mostly 
approproate for 'embarassingly parallel' paradigms, and certainly not 
the tool for parallel matrix multiplication.

Sturla