[SciPy-user] combine two sparse matrices

Anne Archibald peridot.faceted at gmail.com
Sat May 3 17:21:34 EDT 2008


2008/5/3 Robin <robince at gmail.com>:
> Hi,
>
>  I was wondering what the most (memory) efficient way of combining two
>  sparse matrices would be.
>
>  I am constructing a very large sparse matrix, but due to the temporary
>  memory required to calculate the entries I am doing it in blocks, with
>  the computation of each block done in a forked child process. This
>  returns a sparse matrix of the same dimensions as the full one, but
>  with a smaller number of entries. I would like to add the entries from
>  the block result to the 'master' copy. I can be sure that there will
>  be no overlap in the position of entries (ie no matrix position will
>  be in both sides).
>
>  What is the most memory efficient way of combining these? I noticed +=
>  isn't implemented, but it's not clear how that would work anyway. The
>  best I have done so far is adding two lil_matices (the block is
>  created as an lil-matrix for fancy indexing) A = A + Apartial, but as
>  the master copy grows this means I think that I will need double the
>  final memory requirement for A (to add the last block). Is there a
>  better way of doing this?
>
>  Also, what are the memory requirements for the conversions (.tocsc,
>  .tocsr etc.)? Will that mean I need double the memory anyway?

Sparse matrices, in any of the formats implemented in numpy/scipy, are
quite inefficient when dealing with subblocks. Every single entry in
the block must have at the least a four-byte index stored alongside
it. This need not necessarily be a problem, but it seems to be for
you. Perhaps you would be better off working with the blocks,
represented as dense arrays, directly? Or if the direct approach is
too cumbersome, you could write a quick(*) subclass that lets you
index the block as if it were part of a larger matrix.

(*) Actually I'm not sure just how quick it would be, but overriding
__getitem__ and __setitem__ plus the usual initialization stuff ought
to do the job.

Anne



More information about the SciPy-User mailing list