[SciPy-User] Normalizing a sparse matrix

Mon Mar 21 22:04:12 EDT 2011

On 03/20/2011 04:06 PM, coolhead.pranay at gmail.com wrote:
> Hi,
>
> I have a sparse matrix with nearly (300*10000) entries constructed out
> of 14000*14000 matrix...In each iteration after performing some
> operations on the sparse matrix(like multiply and dot) I have to divide
> each row of the corresponding dense matrix with the sum of its elements...

It is not well documented, not really part of the public API and too 
low-level, but you can use scipy.sparse.sparsetools. As it is 
implemented in C++, it should be both cpu and memory efficient:

I am using the following function to normalize each row of a CSR matrix:

def normalize_pairs(pairs):
     """Normalized rows of the pairs matrix so that sum(row) == 1 (or 0 for
     empty rows).

     Note
     ----
     Does the modificiation in-place."""
     factor = pairs.sum(axis=1)
     nnzeros = np.where(factor > 0)
     factor[nnzeros] = 1 / factor[nnzeros]
     factor = np.array(factor)[0]

     if not pairs.format == "csr":
         raise ValueError("csr only")
     csr_scale_rows(pairs.shape[0], pairs.shape[1], pairs.indptr, 
pairs.indices,
                    pairs.data, factor)
     return pairs

I don't advise using this function if reliability is a concern, but it 
works well for matrices bigger than the ones you are mentioning,
cheers,

David