[Numpy-discussion] Faster

Fri May 2 22:36:53 EDT 2008

On Fri, May 2, 2008 at 8:02 PM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Isn't the lengthy part finding the distance between clusters?  I can
> think
> > of several ways to do that, but I think you will get a real speedup by
> doing
> > that in c or c++. I have a module made in boost python that holds
> clusters
> > and returns a list of lists containing their elements. Clusters are
> joined
> > by joining any two elements, one from each. It wouldn't take much to add
> a
> > distance function, but you could use the list of indices in each cluster
> to
> > pull a subset out of the distance matrix and then find the minimum
> function
> > in that. This also reminds me of Huffman codes.
>
> You're right. Finding the distance is slow. Is there any way to speed
> up the function below? It returns the row and column indices of the
> min value of the NxN array x.
>
> def dist(x):
>    x = x + 1e10 * np.eye(x.shape[0])
>    i, j = np.where(x == x.min())
>     return i[0], j[0]
>
> >> x = np.random.rand(500,500)
> >> timeit dist(x)
> 100 loops, best of 3: 14.1 ms per loop
>
> If the clustering gives me useful results, I'll ask you about your
> boost code. I'll also take a look at Damian Eads's scipy-cluster.

That package looks nice. I think your time would be better spent learning
how to use it than in rolling your own routines.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080502/af19f19b/attachment.html>