[SciPy-User] Efficiently adding a vector to every row of a sparse CSR matrix?

Pauli Virtanen pav at iki.fi
Thu Mar 7 11:00:04 EST 2013


Brendan Dolan-Gavitt <mooyix <at> gmail.com> writes:
> As part of implementing a batch calculation of
> Jensen-Shannon divergence, I need to take a (sparse)
> 65536-element vector "V" and add it to every row of
> a (sparse) 500000x65536 matrix "O" of observations.
> Is there any way to do this that is both space and
> time efficient? The usual O+V tries to convert O to
> a dense matrix, which fails because O is too big to
> fit in memory (it would take up ~120 GB!).

What do you need to do with the final M = O + 1 V^T matrix? If you
need it for matrix-vector products (e.g. sparse SVD), it will be cheaper
to keep M around as an abstract linear operator rather than to 
actually form the sparse matrix (which will, in the end, contain
n=500000 times duplicated information).

I don't think Scipy has a specialized routine for adding 
a vector to each row of the sparse matrix. To speed up the
computation of O + 1 V^T over what you have, you can try
writing your own routine for that.

This will cut the memory requirements probably by a factor
of 2, and probably also speed up the computation by some factor.

-- 
Pauli Virtanen




More information about the SciPy-User mailing list