[Numpy-discussion] Faster

Sat May 3 20:31:05 EDT 2008

On Sat, May 3, 2008 at 5:05 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> Robert Kern wrote:
>  > I can get a ~20% improvement with the following:
>
>
> > In [9]: def mycut(x, i):
>  >    ...:     A = x[:i,:i]
>  >    ...:     B = x[:i,i+1:]
>  >    ...:     C = x[i+1:,:i]
>  >    ...:     D = x[i+1:,i+1:]
>  >    ...:     return hstack([vstack([A,C]),vstack([B,D])])
>
>  Might it be a touch faster to built the final array first, then fill it:
>
>  def mycut(x, i):
>      r,c = x.shape
>      out = np.empty((r-1, c-1), dtype=x.dtype)
>      out[:i,:i] = x[:i,:i]
>      out[:i,i:] = x[:i,i+1:]
>      out[i:,:i] = x[i+1:,:i]
>      out[i:,i+1:] = x[i+1:,i+1:]
>      return out
>
>  totally untested.
>
>  That should save the creation of two temporaries.

Initializing the array makes sense. And it is super fast:

>> timeit mycut(x, 6)
100 loops, best of 3: 7.48 ms per loop
>> timeit mycut2(x, 6)
1000 loops, best of 3: 1.5 ms per loop

The time it takes to cluster went from about 1.9 seconds to 0.7
seconds! Thank you.

When I run the single linkage clustering on my data I get one big
cluster and a bunch of tiny clusters. So I need to try a different
linkage method. Average linkage sounds good, but it sounds hard to
code.