[Numpy-discussion] parallel numpy (by Brian Granger) - any info?

Tue Jan 8 13:10:56 EST 2008

> Yes, the problem in this implementation is that it uses pthreads for
> synchronization instead of spin locks with a work pool implementation
> tailored to numpy.  The thread synchronization overhead is horrible
> (300,000-400,000 clock cycles) and swamps anything other than very large
> arrays. I have played with spin-lock based solutions that cut this to,
> on average 3000-4000 cylces.  With that, arrays of 1e3-1e4 elements can
> start seeing parallelization benefits.  However, this code hasn't passed
> the mad-scientist tinkering stage...  I haven't touched it in at least 6
> months, and I doubt I'll get back to it very soon (go Brian!).  It did
> look promising for up to 4 processors on some operations (sin, cos,
> etc.) and worth-it-but-less-beneficial on simple operations (+,-,*,
> etc.).  Coupled with something like weave.blitz or numexpr that can (or
> could) compile multiple binary operations into a single kernel, the
> scaling for expressions with multiple simple operations would scale very
> well.

The distributed array stuff that I will be doing will really focus on
scaling across multiple processes.  The design I have in mind would
not use threads for multi-core parallelization.  Instead, my
assumption is that as time goes on the lower-level building blocks
that I will be using (numpy/blas/lapack) will themselves have
threading support.  I am going to make sure as much as possible that I
defer actual calculations to numpy/blas/lapack so we get threading
support for free.

So, I am afraid that my project won't really further Eric's threaded
numpy efforts.  But, it will definitely help the multicore situation
if you have an algorithm that 1) doesn't require message passing or 2)
is not communications bound.

The other thing to keep in mind is GPUs.  This makes things even more
complicated.  Eventually, I think we will be using a very layered
approach with GPUs + CPU threads at the lowest level and message
passing across multiple processes at a higher level.  Because the NASA
grant is focused on deploying things onto  NASA supercomputers (where
message passing is essentially required) my focus for now will be on
that side of things.

> My tinkering was aimed at a framework that would allow you to write
> little computational kernels in a prescribed way, and then let a numpy
> load-balancer automatically split the work up between worker threads
> that execute these little kernels.  Ideally, this load-balancer would be
> pluggable.  The inner loop for numpy's universal functions is probably
> very close or exactly the interface for  these little kernels. Also, it
> be nice to couple this with weave so that kernels written with weave
> could execute in parallel without user effort.  (This is all like the
> "map" part of map-reduce architecture... The reduce part also need to
> fit in the architecture to generically handle things like sum, etc.)

It would be truly fantastic if such a framework could handle all
levels of parallelization from the GPU+threads all the way up to
message passing.  Sounds like we have a lot to do :)

> Getting it to work with all flavors of numpy arrays (contiguous,
> non-contiguous, buffered, etc.) is quite a chore, but the contiguous
> arrays (and perhaps some non-contiguous) offer some relatively low
> hanging fruit.  Here's to hoping Brian's project bears fruit.

The distributed arrays that I am building will store their local data
in a numpy array - at this point, I don't think it will require local
data to be contiguous.  But for shared memory+threading situations,
this will be more of an issue.

> I haven't thought about matrix ops much, so I don't know if they would
> fit this (minimally) described architecture.  I am sure that they would
> scale well.

Here I think the best approach is to defer to threaded enabled
BLAS/LAPACK packages.

>
> eric
>
>
>
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>