[SciPy-dev] feedback on scipy.sparse

Nathan Bell wnbell at gmail.com
Thu Dec 13 11:23:23 EST 2007


On Dec 13, 2007 6:31 AM, Matthieu Brucher <matthieu.brucher at gmail.com> wrote:
> Not exactly.
> I have something like :
> a = [[0, 2, 5], [3, 4], [4, 2]]
> and then some data :
> data = [1, 2, 3, 4, 5, 6, 7] or [[1, 2, 3], [4, 5], [6, 7]]and then
> the matrix would be :
>
> [[1, 0, 2, 0, 0, 0]
> [0, 0, 0, 3, 4, 0]
> [0, 0, 7, 0, 6, 0]]

Your list of lists nearly matches the lil_matrix format (which is an
array of lists).  Below is the code for lil_matrix.tocsr() which is
the most efficient way I've found to convert that format to CSR:

http://projects.scipy.org/scipy/scipy/browser/trunk/scipy/sparse/sparse.py
2555 	    def tocsr(self):
2556 	        """ Return Compressed Sparse Row format arrays for this matrix.
2557 	        """
2558 	
2559 	        indptr = asarray([len(x) for x in self.rows], dtype=intc)
2560 	        indptr = concatenate( ( array([0],dtype=intc), cumsum(indptr) ) )
2561 	
2562 	        nnz = indptr[-1]
2563 	
2564 	        indices = []
2565 	        for x in self.rows:
2566 	            indices.extend(x)
2567 	        indices = asarray(indices,dtype=intc)
2568 	
2569 	        data = []
2570 	        for x in self.data:
2571 	            data.extend(x)
2572 	        data = asarray(data,dtype=self.dtype)
2573 	
2574 	        return csr_matrix((data, indices, indptr), dims=self.shape)
2575 	

Essentially, it computes the row pointer first and then flattens the
lists.  If you find something faster let me know.

-- 
Nathan Bell wnbell at gmail.com



More information about the SciPy-Dev mailing list