[SciPy-dev] feedback on scipy.sparse

Fri Dec 14 07:31:03 EST 2007

Hi Nathan

On Wed, Dec 12, 2007 at 07:14:49PM -0600, Nathan Bell wrote:
> On Dec 12, 2007 2:28 AM, Stefan van der Walt <stefan at sun.ac.za> wrote:
> > > Also, feel free to respond with any other comments related to
> > > scipy.sparse
> >
> > At the moment, IIRC, functionality for different kinds of sparse
> > arrays are located in the same classes, separated with if's.  I would
> > like to see the different classes pulled completely apart, so the only
> > overlap is in common functionality.
> 
> Do you mean the use of _cs_matrix() to abstract the common parts of
> csr_matrix and csc_matrix?  If so, I recently removed the ifs from the
> constructor and replaced them with a better solution.  I think the
> present implementation is a reasonable compromise between readability
> and redundancy.  In the past the two classes were completely separate,
> each consisting of a few hundred lines of code, and had a tendency to
> drift apart since edits to one didn't always make it into the other.
> Tim's refactoring fixed this without complicating the implementation
> substantially.

I think _cs_matrix is a good idea: the two classes share similar
storage.  Having 'if' statements inside _cs_matrix to check which of
the two formats you are working with, however, would not be a good
idea (but I don't see any of those).

> > I'd also like to discuss the in-place memory assignment policy.  When
> > do we copy on write, and when do we return views?  For example, taking
> > a slice out of a lil_matrix returns a new sparse array.  It is
> > *possible* to create a view, but it gets a bit tricky.  If each array
> > had an "origin" property, such views could be trivially constructed,
> > but it still does not cater for slices like x[::2].
> 
> That is a hard problem.  Can you think of specific uses of this kind
> of functionality that merit the complexity of implementing it?  For
> slices like x[::2] you could introduce a stride tuple in the views,
> but that could get ugly fast.

Say a user wants to examine the first 500 rows of his sparse matrix:

x = build_sparse_matrix()
print x[:500]

It seems like a waste of time to make a new allocation (there may not
even be enough memory to do so).  Which reminds me, print x[:500]
will yield some description of the sparse matrix.  Do we have a way to
print the elements of the sparse matrix?

Are we aiming to support striding on assigments?  I.e.

x[::2] = 5

I suspect that will not be worth the trouble, since a for loop can be
used to assign all the elements.

Regards
Stéfan