[SciPy-User] Normalizing a sparse matrix
David
david at silveregg.co.jp
Mon Mar 21 22:04:12 EDT 2011
On 03/20/2011 04:06 PM, coolhead.pranay at gmail.com wrote:
> Hi,
>
> I have a sparse matrix with nearly (300*10000) entries constructed out
> of 14000*14000 matrix...In each iteration after performing some
> operations on the sparse matrix(like multiply and dot) I have to divide
> each row of the corresponding dense matrix with the sum of its elements...
It is not well documented, not really part of the public API and too
low-level, but you can use scipy.sparse.sparsetools. As it is
implemented in C++, it should be both cpu and memory efficient:
I am using the following function to normalize each row of a CSR matrix:
def normalize_pairs(pairs):
"""Normalized rows of the pairs matrix so that sum(row) == 1 (or 0 for
empty rows).
Note
----
Does the modificiation in-place."""
factor = pairs.sum(axis=1)
nnzeros = np.where(factor > 0)
factor[nnzeros] = 1 / factor[nnzeros]
factor = np.array(factor)[0]
if not pairs.format == "csr":
raise ValueError("csr only")
csr_scale_rows(pairs.shape[0], pairs.shape[1], pairs.indptr,
pairs.indices,
pairs.data, factor)
return pairs
I don't advise using this function if reliability is a concern, but it
works well for matrices bigger than the ones you are mentioning,
cheers,
David
More information about the SciPy-User
mailing list