[SciPy-user] fastest way to populate sparse matrix?

Peter Skomoroch peter.skomoroch at gmail.com
Wed Dec 10 18:56:38 EST 2008


Hmmm, surprisingly the vectorized version seems to take longer:

Original method:
Filling coo_matrix
filling...
data assignment done, filling matrix 1.84783697128
total time to fill coo_matrix: 1.85190200806
done...

Vectorized:
Filling coo_matrix
filling...
data assignment done, filling matrix 3.22157812119
total time to fill coo_matrix: 3.2216091156
done...


On Wed, Dec 10, 2008 at 4:46 PM, Nathan Bell <wnbell at gmail.com> wrote:
> On Wed, Dec 10, 2008 at 4:18 PM, Peter Skomoroch
> <peter.skomoroch at gmail.com> wrote:
>> Nathan,
>>
>> Thanks for the pointer, I had missed that wiki page.
>
> It's fairly recent, so don't feel bad :)
>
>>
>> The bottleneck now seems to be this for-loop, which takes the majority
>> of the remaining time (1.82258105278 seconds):
>>
>>    for index, (i,j) in enumerate(nonzero_indices):
>>        data[index] = dot(W[i,:],H[:,j])
>>
>> Is there a better approach for this assignment block?
>>
>
> You could vectorize the loop:
>
> W = random([n,r]).astype(float32)
> H = random([m,r]).astype(float32) # note, shape is (m,r)
>
> I,J = V.nonzero()
> X = (W[I,:] * H[J,:]).sum(axis=1)
> V_approx = sparse.coo_matrix((X,(I,J)), shape=(n,m))
>
>
> If memory usage of the above is too costly, you could use the same
> approach, but on fixed-sized chunks of the arrays.
>
> --
> Nathan Bell wnbell at gmail.com
> http://graphics.cs.uiuc.edu/~wnbell/
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>



-- 
Peter N. Skomoroch
peter.skomoroch at gmail.com
http://www.datawrangling.com
http://del.icio.us/pskomoroch



More information about the SciPy-User mailing list