[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Sturla Molden
sturla at molden.no
Wed Jun 15 13:15:09 EDT 2011
Den 13.06.2011 19:51, skrev srean:
> If you are on an intel machine and you have MKL libraries around I
> would strongly recommend that you use the matrix multiplication
> routine if possible. MKL will do the parallelization for you. Well,
> any good BLAS implementation would do the same, you dont really need
> MKL. ATLAS and ACML would work too, just that MKL has been setup for
> us and it works well.
Never mind ATLAS. Alternatives to MKL are GotoBLAS2, ACML and ACML-GPU.
GotoBLAS2 is generally faster than MKL. The relative performance of ACML
and MKL depends on the architecture, but both are now fast on either
architecture. ACML-GPU will move matrix multiplication (*GEMM
subroutines) to the (AMD/ATI) GPU if it can (and the problem is large
enough).
MKL used to run in tortoise mode on AMD chips, but not any longer due to
intervention by the Federal Trade Commission.
IMHO, trying to beat Intel or AMD performance library developers with
Python, NumPy and multiprocessing is just silly. Nothing we do with
array operator * and np.sum is ever going to compare with BLAS functions
from these libraries.
Sometimes we need a little bit more course-grained parallelism. Then
it's time to think about Python threads and releasing the GIL or use
OpenMP with C or Fortran.
multiprocessing is the last tool to think about. It is mostly
approproate for 'embarassingly parallel' paradigms, and certainly not
the tool for parallel matrix multiplication.
Sturla
More information about the NumPy-Discussion
mailing list