[SciPy-dev] Ideas for scipy.sparse?

Fri Apr 11 17:35:19 EDT 2008

>  >  1) I need N-dimensional sparse arrays.  Some of the storage formats in
>  >  scipy.sparse (dok, coo, maybe lil) could be generalized to
>  >  N-dimensions, but some work would have to be done.
>
>  To make this efficient, you'd probably need a lower-level
>  implementation of a hash-based container like DOK.
>
>  BTW, which applications use sparse N-dimensional arrays?

The big place that I need them right now is for handling so called
ghost cells.  If you are not familiar with this idea, ghost cells are
parts of a distributed array that are not on your local processor
(they are on another one).  When we fetch the ghost cells we need a
structure to store them in and a sparse N-dim array is the best
option.

In terms of true applications for N-dim sparse arrays, I am not sure.
But, the higher-dimension an array is the more important sparsity is.
Sure numpy can handle 6 dimensional arrays, but if they are dense,
they will likely be too big.  So I guess a better question is:  what
are the applications for N-dimensional dense arrays -> if you find an
application for N>3, then you should probably consider using sparse
arrays.

>  >  2) I need these things to be in numpy.  I hate to start another
>  >  "should this go into numpy or scipy" thread, but I actually do think
>  >  there is a decent case for moving the core sparse arrays into numpy
>  >  (not the solvers though).  Please hear me out:
>
>  I don't see the need for or utility of sparse arrays in numpy.  Numpy
>  does low-level manipulation of dense arrays/memory buffers.

True, well sort of.  That is what Numpy does today.  But, I think
there are good reasons to extend that reach more broadly to other
types of arrays.  My understanding of the scope of numpy is that it is
the "base layer."  I feel that sparse arrays are (conceptually) a part
of that base layer.

>
>  >  3) I need sparse arrays that are implemented more in C.  What do I
>  >  mean by this.  I am using cython for the performance critical parts of
>  >  my package and there are certain things (indexing in tight loops for
>  >  example) that I need to do in c.  Because the current sparse array
>  >  classes are written in pure python (with a few c++ routines underneath
>  >  for format conversions), this is difficult.  So...
>
>  All of the costly operations on CSR/CSC/COO matrices are done in C++.
>  Only lil_matrix and dok_matrix are pure-Python implementations.

Yes, but to use the fast c++ code, I have to go through a slow python
layer - the actual csr/csc/coo classes and slow+fast = slow in my
book.  Also, the dok/lil formats are some of the most important and
they should be optimized.

>  Indexing sparse arrays inside a Python loop *is* slow, but there's not
>  much that can be done about it.

I will write a simple dok format sparse matrix using cython and we can
compare the performance.

>
>  >  I think it would be a very good idea to begin moving the sparse array
>  >  classes to cython code.  This would be a very nice approach because it
>  >  could be done gradually, without breaking any of the API.  The benefit
>  >  is that we could improve the performance of the sparse array classes
>  >  drammatically, while keeping things very maintainable.
>
>  I'm not aware of any performance problems with the existing backend to
>  scipy.sparse.  What would you implement in cython that's not already
>  implemented in scipy.sparse.sparsetools?

This is absolutely true.  Don't misunderstand me.  I think that the
c++ backend in sparsetools in _really_ nice and I am not suggesting
changing that to anything else.  The only think I would implement in
cython is the higher-level sparse classes.  These are the slow part
(they are pure python) and cython would help a lot there.  Again, I
_really_ like the current design and interfaces of these classes and I
am not suggesting changing that (other than to support N-dim arrays
for certain formats).

>
>  >  3) That we begin to move their implementation to using Cython (as an
>  >  aside, cython does play very well with templated C++ code).  This
>  >  could provide a much nicer way of tying into the c++ code than using
>  >  swig.
>
>  In the context of scipy.sparse, what's wrong with SWIG and C++?  What
>  would be improved by using Cython?

Think of it in terms of layers (I will use the csr format as an example):

sparse/csr.py        => top level python class that wraps the lower layers.
sparsetools/csr.py =>  thin swig generated python wrapper that
introduces overhead
sparsetools/csr.h  =>     fast c++ code

Using swig forces me to deal with two layers of python code before I
can talk to the fast c++ code.  The problem with this is that I want
to call the top level layer from C!  This would be like NumPy not
having a C API!

The cython stack would look like this

sparse/csr.pyx        => top level python class that is a C extension
type and can be called from C or python
sparsetools/csr.h  =>     fast c++ code

In that case, I could write my extension code in C and talk directly
to the C interface without any overhead of multiple layers of Python.

To me that is a huge difference.  But again, the existing c++ code
could be used with probably no modifications.

Brian
>  >  Alright, fire away :)
>
>  :)
>
>  --
>  Nathan Bell wnbell at gmail.com
>  http://graphics.cs.uiuc.edu/~wnbell/
>  _______________________________________________
>  Scipy-dev mailing list
>  Scipy-dev at scipy.org
>  http://projects.scipy.org/mailman/listinfo/scipy-dev
>