[SciPy-Dev] sparse vectors / matrices / tensors

David Cournapeau cournape at gmail.com
Tue Sep 20 13:26:31 EDT 2011


Yannick,

On Tue, Sep 20, 2011 at 12:06 PM, Yannick Versley <yversley at gmail.com> wrote:
> I have been working quite a lot with sparse vectors and sparse matrices
> (basically
> as feature vectors in the context of machine learning), and have noticed
> that they
> do crop up in a lot of places (e.g. the CVXOPT library, in scikits, ...) and
> that people
> tend to either reinvent the wheel (i.e. implement a complete sparse matrix
> library) or
> pretend that no separate data structure is needed (i.e. always passing along
> pairs of
> coordinate and data arrays).
> The most obvious response is to point to scipy.sparse, however I ended up
> reimplementing a sparse matrix library myself because
> - scipy.sparse is limited to matrices and has no vectors or order-k tensors
> - LIL and DOK are not really efficient or convenient data structures to
> create
>   sparse matrices (my own library basically keeps a list of unordered COO
> items
>   and compacts/sorts them when the matrix is actually used as a matrix)

I think those statementds are pretty uncontroversial. Implementing a
sparse array (not limited to 1/2 dimensions) is a non trivial
endeavour because the efficient 2d representations (csr/csc) do not
generalize well.

I started looking into the literature about those issues, and found a
few interesting data structures, such as the UB-tree
(http://www.scholarpedia.org/article/B-tree_and_UB-tree). I am
actually currently looking into implementing this for a basic sparse
tensor to get an idea of its efficiency (memory and speed-wise).

cheers,

David



More information about the SciPy-Dev mailing list