[Numpy-discussion] Efficient square distance computation

Ke Sun sunk.cs at gmail.com
Tue Oct 8 07:38:28 EDT 2013


On Tue, Oct 08, 2013 at 01:49:14AM -0700, Matthew Brett wrote:
> Hi,
> 
> On Tue, Oct 8, 2013 at 1:06 AM, Ke Sun <sunk.cs at gmail.com> wrote:
> > Dear all,
> >
> > I have written the following function to compute the square distances of a large
> > matrix (each sample a row). It compute row by row and print the overall progress.
> > The progress output is important and I didn't use matrix multiplication.
> >
> > I give as input a 70,000x800 matrix. The output should be a 70,000x70,000
> > matrix. The program runs really slow (16 hours for 1/3 progress). And it eats
> > 36G memory (fortunately I have enough).
> 
> That is very slow.
> 
> As a matter of interest - why didn't you use matrix multiplication?
Because it will cost hours and I want to see the progress and 
know how far it goes. Another concern is to save memory and
compute one sample at a time.

> On a machine I had access to it took about 20 minutes.
How? I am using matrix multiplication (the same code as 
http://stackoverflow.com/a/4856692) and it runs for around 18 hours.

Best,
Ke Sun
> 
> You've got a 70000 by 70000 element output matrix so I think that's
> 37G already (if the matrix is double precision float).
> 
> > Could you give some insights on how to modify the code to be efficient and
> > to eat less memory?
> 
> You could try using Cython - but I'm guessing that the BLAS routines
> in numpy will already do this the most efficient way.



More information about the NumPy-Discussion mailing list