[Matrix-SIG] parallelized NumPy?

James A. Crotinger jac@lanl.gov
Tue, 05 Jan 1999 11:01:09 -0700


----------
>From: "Paul F. Dubois" <dubois1@llnl.gov>
>To: <matrix-sig@python.org>, "Alan Grossfield" <alan@groucho.med.jhmi.edu>
>Subject: Re: [Matrix-SIG] parallelized NumPy?
>Date: Mon, Jan 4, 1999, 5:28 PM
>

>
>-----Original Message-----
>From: Alan Grossfield <alan@groucho.med.jhmi.edu>
>To: matrix-sig@python.org <matrix-sig@python.org>
>Date: Monday, January 04, 1999 6:03 AM
>Subject: [Matrix-SIG] parallelized NumPy?
>
>
>>I'm curious -- is anyone working on a parallelized version of NumPy?
>>I poked around the various archives but didn't turn anything up.
>>
>If you mean, a numerical object that is actually a distributed array, no. My
>best guess is that the quickest means to that end would involve using LANL's
>data-parallel arrays in their POOMA project.
>See http://www.acl.lanl.gov/software. If you mean arrays that do their
>operations on shared data with many threads, no.

Actually, this is exactly what the initial release of the Pooma II Array
class does for Arrays with "multi-patch Engines"; i.e. Pooma still expects
the user to do the domain decomposition. Given a set of domain-decomposed
arrays, Pooma can use threads to run array expressions involving these
arrays
in parallel. 

HOWEVER, like Blitz, Pooma II is using expression templates to analyze the
expressions and build the "runnable" objects that are passed off to a 
work queue for threads to work on. It isn't obvious to me that this can
be done with Python wrappers around Pooma II arrays, short of doing run-time
code generation. One could, of course, set up all of the basic binary
operations in such a way that they could be done in a threaded fashion
and then have Python operators just call the compiled Pooma operators,
which would then run in parallel, but they would still get the binary
expression hit that expression templates is trying to avoid. 

For more information on Pooma II Arrays, take a look at

http://www.acl.lanl.gov/pooma/papers/GenericProgrammingPaper/dagstuhl.pdf
http://www.acl.lanl.gov/pooma/papers/iscope98.pdf
http://www.acl.lanl.gov/pooma/papers/GenericProgrammingInPOOMA

Steve Karmesin also gave an excellent presentation at ISCOPE 98 that will be
put up on the web site in the near future. 

You can download the first release of the Pooma II framework from

http://www.acl.lanl.gov/pooma/download.html

This release has shared-memory parallel arrays that have currently only been
tested on the SGIs, where we use a custom light-weight thread package. We're
working on an implementation that sits on top of PThreads. We have a version
of the light-weight thread package for LINUX, but we've had problems with
the thread safety of glibc, owing to the fact that we also use a custom
mutex. (A spin-lock). We hope to have shared-memory parallelism working on
LINUX in the near future. 

Our long-term goal is to support cross-box distributed arrays (our primary
platform is a collection of 48 128-processor Origin 2000s), but this will
probably not be done until the second half of '99 as we're first trying to
get the full Pooma I functionality working on-box (particles, fields, and
other features used in writing Pooma applications codes). 

For more information, check out the web site (www.acl.lanl.gov/pooma) or
contact me.

  Jim