[Numpy-discussion] Efficient square distance computation

Julian Taylor jtaylor.debian at googlemail.com
Tue Oct 8 05:01:15 EDT 2013


your computation is symmetric so you only need to compute the upper or
lower triangle which will save both memory and time.


On Tue, Oct 8, 2013 at 10:06 AM, Ke Sun <sunk.cs at gmail.com> wrote:

> Dear all,
>
> I have written the following function to compute the square distances of a
> large
> matrix (each sample a row). It compute row by row and print the overall
> progress.
> The progress output is important and I didn't use matrix multiplication.
>
> I give as input a 70,000x800 matrix. The output should be a 70,000x70,000
> matrix. The program runs really slow (16 hours for 1/3 progress). And it
> eats
> 36G memory (fortunately I have enough).
>
> Could you give some insights on how to modify the code to be efficient and
> to eat less memory?
>
> thanks,
> Ke Sun
>
> def dist2_large( data ):
>     import time
>     if data.ndim != 2: raise RuntimeError( "data should be a matrix" )
>     N,D = data.shape
>
>     print 'using the sample-wise implementation'
>     print '%d samples, %d dimensions' % (N,D)
>
>     start_t = time.time()
>     d2 = np.zeros( [N,N] )
>     for i in range( N ):
>         print "\r%5d/%d" % (i+1, N),
>         for j in range( N ):
>             d2[i,j] = ((data[i] - data[j])**2).sum()
>
>     total_t = time.time() - start_t
>     hours = (total_t / 3600)
>     minutes = (total_t % 3600) / 60
>     print "\nfinished in %2dh%2dm" % (hours, minutes)
>
>     return d2
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131008/feb298c4/attachment.html>


More information about the NumPy-Discussion mailing list