[SciPy-dev] feedback on scipy.sparse

Wed Dec 12 20:14:49 EST 2007

On Dec 12, 2007 2:28 AM, Stefan van der Walt <stefan at sun.ac.za> wrote:
> I'd like to see the separate classes moving into their own files.
>
> Eye, diags etc. make use of specific properties of each array type, so
> I wonder whether those operations shouldn't be implemented as static
> class methods?

That's a possibility.  If we adopt the solution below you could simply
define (in spmatrix)

class spmatrix:

    def eye(n):
        return spidentity(n,format=self.format)

I'd prefer to hold off on this idea until it's clear that people want
it.  I fear that adding too many static methods would clutter the
classes.

> > I propose the following policy.  Functions get a new parameter
> > 'format' which defaults to None.  The default implies that the
> > function will return the matrix in whatever format is most natural
> > (and subject to change).  For example:
> >    spidentity(n, dtype='d',format=None)
> > might return a dia_matrix(), or a special identity matrix format in
> > the future.  At a minimum, the valid values of 'format' will include
> > the three-letter abbreviations of the currently supported sparse
> > matrix types (i.e. 'csr', 'csc', 'coo', 'lil', etc).  Comments?
>
> Sounds good!

Great.  I'll go ahead with this idea unless someone else weighs in.

> > Also, feel free to respond with any other comments related to
> > scipy.sparse
>
> At the moment, IIRC, functionality for different kinds of sparse
> arrays are located in the same classes, separated with if's.  I would
> like to see the different classes pulled completely apart, so the only
> overlap is in common functionality.

Do you mean the use of _cs_matrix() to abstract the common parts of
csr_matrix and csc_matrix?  If so, I recently removed the ifs from the
constructor and replaced them with a better solution.  I think the
present implementation is a reasonable compromise between readability
and redundancy.  In the past the two classes were completely separate,
each consisting of a few hundred lines of code, and had a tendency to
drift apart since edits to one didn't always make it into the other.
Tim's refactoring fixed this without complicating the implementation
substantially.

> I'd also like to discuss the in-place memory assignment policy.  When
> do we copy on write, and when do we return views?  For example, taking
> a slice out of a lil_matrix returns a new sparse array.  It is
> *possible* to create a view, but it gets a bit tricky.  If each array
> had an "origin" property, such views could be trivially constructed,
> but it still does not cater for slices like x[::2].

That is a hard problem.  Can you think of specific uses of this kind
of functionality that merit the complexity of implementing it?  For
slices like x[::2] you could introduce a stride tuple in the views,
but that could get ugly fast.

-- 
Nathan Bell wnbell at gmail.com