[SciPy-dev] Ideas for scipy.sparse?

Brian Granger ellisonbg.net at gmail.com
Fri Apr 11 13:47:51 EDT 2008


Hi,

Just because there haven't been any interesting threads lately....

So, I am currently implementing a distributed memory array package for python:

http://projects.scipy.org/ipython/ipython/browser/ipythondistarray

The goal is to have distributed/parallel arrays that look and feel
just like numpy arrays.  Here is an example:

import ipythondistarray as ipda

a = ipda.random.rand((10,100,100), dist=(None,'b','c'))
b = ipda.random.rand((10,100,100), dist=(None,'b','c'))
c = 0.5*ipda.sin(a) + 0.5*ipda.cos(b)
print c.sum(), c.mean(), c.std(), c.var()

This works today on multiple processors.  Don't get too excited
though, there is still _tons_ of work to be done....

Here is the main issue:  I am running into a need for sparse arrays.
There are two places I am running into this:

1) I want to implement sparse distributed arrays and need sparse local
arrays for this.

2) There are other places in the implementation where sparse arrays are needed.

Obviously, my first though was scipy.sparse.  I am _really_ excited
about the massive improvements that have been happening in this area
recently.  Here are the problems I am running into:

1) I need N-dimensional sparse arrays.  Some of the storage formats in
scipy.sparse (dok, coo, maybe lil) could be generalized to
N-dimensions, but some work would have to be done.

2) I need these things to be in numpy.  I hate to start another
"should this go into numpy or scipy" thread, but I actually do think
there is a decent case for moving the core sparse arrays into numpy
(not the solvers though).  Please hear me out:

a) Numpy at its core is about arrays.  Conceptually, sparse arrays fit
into this narrow vision of Numpy.

b) Sparse arrays are just as foundational as dense arrays in many
areas of computing/science (I would argue, that they are more
foundational than ffts and random numbers).

c) Moving the core sparse arrays into numpy would increase their
visibility and encourage other projects to rely on them.

d) It would not make numpy more difficult to build.

e) It is currently somewhat confusing that they are not in numpy
(remember Numpy = arrays).

3) I need sparse arrays that are implemented more in C.  What do I
mean by this.  I am using cython for the performance critical parts of
my package and there are certain things (indexing in tight loops for
example) that I need to do in c.  Because the current sparse array
classes are written in pure python (with a few c++ routines underneath
for format conversions), this is difficult.  So...

I think it would be a very good idea to begin moving the sparse array
classes to cython code.  This would be a very nice approach because it
could be done gradually, without breaking any of the API.  The benefit
is that we could improve the performance of the sparse array classes
drammatically, while keeping things very maintainable.

In summary, I am proposing:

1) That we move the core sparse array classes from scipy.sparse to a
new package numpy.sparse

2) That we extend some sparse arrays classes to be fully N-dimensional.

3) That we begin to move their implementation to using Cython (as an
aside, cython does play very well with templated C++ code).  This
could provide a much nicer way of tying into the c++ code than using
swig.

Alright, fire away :)

Brian



More information about the SciPy-Dev mailing list