[Numpy-discussion] Fastest distance matrix calc
Keir Mierle
mierle at gmail.com
Mon Apr 16 19:31:55 EDT 2007
On 4/13/07, Timothy Hochberg <tim.hochberg at ieee.org> wrote:
> On 4/13/07, Bill Baxter <wbaxter at gmail.com> wrote:
> > I think someone posted some timings about this before but I don't recall.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246
[snip]
> I'm going to go out on a limb and contend, without running any timings, that
> for large M and N, a solution using a for loop will beat either of those.
> For example (untested):
>
> results = empty([M, N], float)
> # You could be fancy and swap axes depending on which array is larger, but
> # I'll leave that for someone else
> for i, v in enumerate(x):
> results[i] = sqrt(sum((v-y)**2, axis=-1))
> Or something like that. The reason that I suspect this will be faster is
> that it has better locality, completely finishing a computation on a
> relatively small working set before moving onto the next one. The one liners
> have to pull the potentially large MxN array into the processor repeatedly.
In my experience, it is indeed the case that the for loop version is
faster. The fastest of the three versions offered in the above url is
the last:
from numpy import mat, zeros, newaxis
def calcDistanceMatrixFastEuclidean2(nDimPoints):
nDimPoints = array(nDimPoints)
n,m = nDimPoints.shape
delta = zeros((n,n),'d')
for d in xrange(m):
data = nDimPoints[:,d]
delta += (data - data[:,newaxis])**2
return sqrt(delta)
This is easily extended to two different nDimPoints matricies.
Cheers,
Keir
More information about the NumPy-Discussion
mailing list