[SciPy-dev] Ideas for scipy.sparse?

Sat Apr 12 00:53:59 EDT 2008

On 11/04/2008, Brian Granger <ellisonbg.net at gmail.com> wrote:
> >  >  1) I need N-dimensional sparse arrays.  Some of the storage formats in
>  >  >  scipy.sparse (dok, coo, maybe lil) could be generalized to
>  >  >  N-dimensions, but some work would have to be done.
>  >
>  >  To make this efficient, you'd probably need a lower-level
>  >  implementation of a hash-based container like DOK.
>  >
>  >  BTW, which applications use sparse N-dimensional arrays?
>
> The big place that I need them right now is for handling so called
>  ghost cells.  If you are not familiar with this idea, ghost cells are
>  parts of a distributed array that are not on your local processor
>  (they are on another one).  When we fetch the ghost cells we need a
>  structure to store them in and a sparse N-dim array is the best
>  option.

Hmm.

It seems to me that, in the interests of efficiency, it will not often
be a good idea to allocate data to processors on an element-by-element
basis; instead one will often want to allocate blocks of elements to
each processor. This will be very inefficiently represented by
ordinary sparse matrices, which take no advantage of ranges or
higher-dimensional blocks of nonzero entries. What's more, an opaque
sparse matrix library would be very frustrating in this context, since
you want to distinguish unavailable entries from entries that are
actually zero.

It seems like what you need is some sort of proxy object that keeps
track of some chunks of an array and serves them up as requested -
possibly even as views of underlying numpy arrays - if available, and
calls some network message-passing code if they're not available.

How do existing distributed-array toolkits handle the problem?

Anne