[Numpy-discussion] Faster

Robert Kern robert.kern at gmail.com
Fri May 2 22:25:01 EDT 2008


On Fri, May 2, 2008 at 9:02 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
>
> <charlesr.harris at gmail.com> wrote:
>
> > Isn't the lengthy part finding the distance between clusters?  I can think
>  > of several ways to do that, but I think you will get a real speedup by doing
>  > that in c or c++. I have a module made in boost python that holds clusters
>  > and returns a list of lists containing their elements. Clusters are joined
>  > by joining any two elements, one from each. It wouldn't take much to add a
>  > distance function, but you could use the list of indices in each cluster to
>  > pull a subset out of the distance matrix and then find the minimum function
>  > in that. This also reminds me of Huffman codes.
>
>  You're right. Finding the distance is slow. Is there any way to speed
>  up the function below? It returns the row and column indices of the
>  min value of the NxN array x.
>
>  def dist(x):
>     x = x + 1e10 * np.eye(x.shape[0])
>     i, j = np.where(x == x.min())
>
>     return i[0], j[0]

Assuming x is contiguous and you can modify x in-place:


In [1]: from numpy import *

In [2]: def dist(x):
   ...:    x = x + 1e10 * eye(x.shape[0])
   ...:    i, j = where(x == x.min())
   ...:    return i[0], j[0]
   ...:

In [3]: def symmdist(N):
   ...:     x = random.rand(N, N)
   ...:     x = x + x.T
   ...:     x.flat[::N+1] = 0
   ...:     return x
   ...:

In [4]: symmdist(5)
Out[4]:
array([[ 0.        ,  0.87508654,  1.11691704,  0.80366071,  0.57966808],
       [ 0.87508654,  0.        ,  1.5521685 ,  1.74010886,  0.52156877],
       [ 1.11691704,  1.5521685 ,  0.        ,  1.22725396,  1.04101992],
       [ 0.80366071,  1.74010886,  1.22725396,  0.        ,  1.94577965],
       [ 0.57966808,  0.52156877,  1.04101992,  1.94577965,  0.        ]])

In [5]: def kerndist(x):
   ...:     N = x.shape[0]
   ...:     x.flat[::N+1] = x.max()
   ...:     ij = argmin(x.flat)
   ...:     i, j = divmod(ij, N)
   ...:     return i, j
   ...:

In [10]: x = symmdist(500)

In [15]: %timeit dist(x)
10 loops, best of 3: 19.9 ms per loop

In [16]: %timeit kerndist(x)
100 loops, best of 3: 4.38 ms per loop

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco



More information about the NumPy-Discussion mailing list