[SciPy-Dev] Sparse boolean specification

Pauli Virtanen pav at iki.fi
Mon Apr 22 04:23:24 EDT 2013


Blake Griffith <blake.a.griffith <at> gmail.com> writes:
[clip]
> When comparing two sparse matrices of the same type.
> A sparse matrix of the same type, with bool dtype, should
> be returned with all True elements with original sets of
> elements. A element without a corresponding value in the
> other spmatrix is False. Does this make sense? An example: 
> 
> >>> coo_matrix([True, True], [1,1], [2,2]) == coo_matrix([True, True], 
[1,3], [1,3])
> coo_matrix([True], [1], [1])

I think the user expectation here is rather clear: for sparse
matrices A, B and sparse matrix type spmatrix, we have for all
boolean operations

    boolean_op(A, B)

    is fully equivalent to

    spmatrix(boolean_op(A.todense(), B.todense()))

Deviating from this will undoubtedly lead to surprises and bugs,
and IMHO would be a design wart.

The drawback here is that `==`, `<=`, `>=` become somewhat
useless for sparse matrices as it tends to produce matrices
filled with True. But I think this cannot be helped.

This doesn't exclude adding other boolean ops, for instance
ones that work inside the union of the sparsity patterns,
which I think is what I think you proposed. I think these could
be added, but should be added either as new functions (my
preference) or methods, and you should have some use cases
to tell you where these would be useful.

(There could by the way be some room for making dealing with
sparsity patterns easier. Not sure what exactly, but it's
probably possible to think of use cases in this direction where
things could be improved. For PDE matrix assembly for instance,
it's commonly the case that the sparsity pattern stays constant
but the values change. It can be here worthwhile to take a look
at what PETSc and other packages offer.)

> When comparing sparse matrices with numpy ndarrays or
> matrices. The sparsematrix can probably be easily expressed
> as a dense matrix. So we should spmatrix.toarray() or
> spmatrix.todense() and compare it with the ndarray or
> matrix. Returning a ndarray or matrix with bool
> dtype where each element is a[i,j] = (b[i,j] == c[i,j]),
> for comparing B == C. Like wise for other comparisons.
> 
> Does this sound good so far? 

sparse/dense can probably as well return a dense result.

If broadcasting is implemented, returning dense may not be
the best choice as the result can well be sparse in that case.

> Also, should I write this up like a PEP?

We don't have a formal process for additions, but for a larger
feature additions it can be useful to have a writeup at hand.

    ***

Regarding the bigger picture:

One problem with scipy.sparse is that there are 6 different
sparse matrix types, which multiplies the effort involved
by a factor of 6.

As I see it, the order of priority in implementing new features
would be CSR & CSC > LIL > the others.

Also, it will probably be better to have a 100% working
implementation for CSR+CSC, rather than 80% working
implementations for all types.

-- 
Pauli Virtanen




More information about the SciPy-Dev mailing list