[SciPy-user] Distributed Array Library?

Gregory Crosswhite gcross at u.washington.edu
Thu Apr 26 18:57:14 EDT 2007


Hey everyone!  I would appreciate some advice on a problem I am facing.

I have written a code using the numpy library that (among other  
things) performs contractions of a tensor network.  Unfortunately, I  
have reached the point where my tensors are growing too big to handle  
in a single computer, so I want to rework my code so that it works on  
a cluster or grid.

My question is:  do you have suggestions for tools that would let me  
have ndarray like functionality with an array that could be  
distributed over many processors?  Specifically, I would like to be  
able to create very large (possibly multi-gigabyte) tensors with an  
arbitrary number of dimensions, to be able to transpose indices and  
reshape dimensions, and to take general tensor products.

After searching online, it looked like there was a package online  
called GlobalArrays that allows one to easily create distributed  
arrays, but it has the following characteristics that I would have to  
work around:

	*) No Python binding at present.  (One used to exist, but it has  
disappeared from the internet.  :-) )
	*) No capability for transposing indices or reshaping dimensions
	*) The distributed inner product operations do not take stride  
arguments.

I also saw something called the Tensor Contraction Engine which might  
have some support for this kind of thing, but the documentation for  
the actual tensor contraction part of the system seemed very sparse  
so I cannot tell whether .

I wonder whether it would be feasible to integrate something like  
this into the numpy core;  I looked through the Guide to NumPy (thank  
to Travis for taking the time to write such comprehensive  
documentation!) and saw that there were various hooks to implement  
one's own type, along with operations to perform a dot product,  
ufuncs, and the like, but all of these seem to assume that one has a  
uniform memory layout so that adopting them for a distributed array  
would be an exercise in futility.

Do the wise men and women of this list have any advice regarding the  
best tool to use?  :-)

Thank you very much in advance!

- Gregory Crosswhite




More information about the SciPy-User mailing list