Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

sturlamolden sturlamolden at yahoo.no
Wed Nov 8 12:41:15 EST 2006


robert wrote:

> I'd like to use multiple CPU cores for selected time consuming Python computations (incl. numpy/scipy) in a frictionless manner.

Threading is not the best way to exploit multiprocessors in this
context. Threads are not the preferred way of exploiting multiple
processors in scientific computing.

here are a few thoughts on the matter:

1. SciPy uses ATLAS/BLAS and LAPACK. You can compile these libraries
for SMPs. The same goes for FFTW, vendor optimized math kernels, etc.
If most of the CPU time is spent inside these numeric libraries, using
multi-processor versions of these libraries are a much better strategy.

2. The number of CPUs are not the only speed limiting factor on an SMP.
Use of cache and prefetching are just as important. That can make
multi-processor aware numeric libraries a lot more efficient than
manual multi-threading.

3. One often uses cluster architectures (e.g. Beowulf) instead of SMPs
for scientific computing. MPI works on SMP and clusters. Threads only
work on SMPs.

4. Fortran compilers can recognize parallel array statements in
Fortran90/95 and exploit multiple processors on an SMP automatically.
NumPy should be able to to the same when it matures. E.g. if you make a
statement like "arr[1,::] = arr[2,::] * arr[3,::]", then this statement
could be evaluated in parallel on multiple CPUs, without any
multi-threading on your part. Since the order in which the
multiplications are performed are of no significance, the work can just
as well be spread out to multiple processors in an SMP or a cluster.
NumPy is still immature, but Fortran compilers have done this at least
two decades.

5. Streaming SIMD extensions (SSE) and similar opcodes: Are you aware
that Pentium III (and newer) processors are pipe-lined to do four
floating-point operations in parallel? You could theoretically
quadruple your flops using the SSE registers, using no threading at
all. (The actual improvement is slightly less, due to some extra
book-keeping required to get the data in and out of the SSE registers.)
Again this requires modifications inside NumPy, not multi-threading.

> If not, would it be an idea to create such thing in the Python std libs to make Python multi-processor-ready. I guess Python will always have a GIL - otherwise it would loose lots of comfort in threaded programming

I would say that the GIL actually has very little effect of Python's
potential in high-performance numeric and scientific computing. It all
depends on the libraries, not on Python per se. Threading is for making
certain tasks more comfortable to write, not so much for computational
speed.




More information about the Python-list mailing list