[Numpy-discussion] distance_matrix: how to speed up?

Emanuele Olivetti emanuele at relativita.com
Wed May 21 10:28:51 EDT 2008


Rob Hetland wrote:
> I think you want something like this:
>
> x1 = x1 * weights[np.newaxis,:]
> x2 = x2 * weights[np.newaxis,:]
>
> x1 = x1[np.newaxis, :, :]
> x2 = x2[:, np.newaxis, :]
> distance = np.sqrt( ((x1 - x2)**2).sum(axis=-1) )
>
> x1 and x2 are arrays with size of (npoints, ndimensions), and npoints  
> can be different for each array.  I'm not sure I did your weights  
> right, but that part shouldn't be so difficult.
>
>   

Weights seem not right but anyway here is the solution adapted from
Bill Baxter's :

def distance_matrix_final(data1,data2,weights):
data1w = data1*weights
dm =
(data1w*data1).sum(1)[:,None]-2*N.dot(data1w,data2.T)+(data2*data2*weights).sum(1)
dm[dm<0] = 0
return dm

This solution is super-fast, stable and use little memory.
It is based on the fact that:
(x-y)^2*w = x*x*w - 2*x*y*w + y*y*w

For size1=size2=dimensions=1000 requires ~0.6sec. to compute
on my dual core duo. It is 2 order of magnitude faster than my
previous solution, but 1-2 order of magnitude slower than using
C with weave.inline.

Definitely good enough for me.


Emanuele




More information about the NumPy-Discussion mailing list