[SciPy-dev] sparse comments and questions
Ed Schofield
edschofield at gmail.com
Sun Jul 8 06:51:40 EDT 2007
On 7/8/07, Nathan Bell <wnbell at gmail.com> wrote:
> Currently the sparse matrix storage formats (e.g. sparse.csr_matrix )
> only support integer indices ( numpy.intc ) and floating point data
> (float32, float64, complex64, and complex128). While this addresses
> the common usage, it would be nice to support other index and data
> formats also. For example, when using a csr_matrix to represent a
> graph the dtype is mostly irrelevant, so smaller types like uint8
> would be desirable. Likewise, on a 64-bit machine, one might like to
> use 32-bit indices (instead of the default C-int size) to economize on
> space.
>
> The sparsetools routines are fully templated, so it's a simple matter
> to support the other types there. However, I don't know whether other
> subpackages that use sparse matrices would require changes. Any
> thoughts?
Yes, good idea. I don't imagine preserving compatibility with existing
packages would be difficult.
> I have a few other questions and comments regarding the sparse package.
>
> 1) Does anyone depend on the .ftype attribute of the sparse formats?
> Can it be eliminated? If not, can it be trapped by __getattr__() so
> that it doesn't become unsynchronized from data.dtype?
As far as I'm concerned it can be eliminated. But it's been around for
years (before I started working on sparse) and I have no idea what it
does. Travis?
> 3) Should we allow modifications to CSR and CSC matrices that force
> O(N) updates/reallocations. For instance, adding a new value into a
> CSR matrix is essentially as expensive as building a new one from
> scratch. Could we instead prohibit such modifications and raise an
> exception informing the user that lil_matrix is the proper format for
> such operations? Note that changing an existing matrix value is not
> so costly (typically O(1)), so not all calls to __setitem__ would be
> banned.
Good point. There's an argument to be made for protecting users from
themselves. Might there be a need for the occasional O(N) tweak to an
existing CSR/C matrix? I don't know. I suggest we spin off this
functionality from __setitem__ into a separate module-level function
called modify_cs_matrix_for_masochists().
> 4) lil_matrix now supports more sophisticated slicing, but the
> implementation is pretty hairy and not especially fast. Is anyone
> looking at this? Does the slicing work correctly? I fixed a bug
> related to negative indices, but I'm not completely confident in the
> robustness of my fix.
The implementation seems cleaner now as a result of your work. To have
more confidence in the slicing we also want more unit tests ...
> 5) Scipy's spdiags() works differently than the MATLAB version.
> Should it be changed to coincide? If we later support a sparse
> diagonal format, should the output of spdiags change to dia_matrix
> instead of the present csc_matrix?
Yes, I think so.
> 6) When working with sparse matrices, do people expect the result to
> be of the same type? For example, it's far more efficient to make
> transpose(csr_matrix) output a csc_matrix (as is currently done in
> SciPy) since that's a O(1) operation (as opposed to an O(n) cost
> otherwise). So in general, should a method to return the result in
> most efficient output format, relying on the user to convert it to
> another format if so desired? I.e. if you need a csr_matrix at a
> given point, then you should call .tocsr() to be sure. If it's
> already a csr_matix, then no conversion will be done.
Yes, definitely.
> 7) Where do functions that operate on sparse matrices belong? For
> example, suppose I wrote a function to extract the lower diagonal
> entries of a matrix. Should it be added to sparse.py (like spdiags)
> or somewhere else?
Yes, sparse.py is a good place for it.
-- Ed
More information about the SciPy-Dev
mailing list