[SciPy-User] Speeding up a search algorithm

Wed Jun 2 18:44:35 EDT 2010

How about the following strategy:

#set up arrays of indices
X, Y = numpy.ogrid[:distancematrix.shape[0], :distancematrix.shape[1]

#pull out the relevant chunk
search = distancematrix[current, current]
x = X[current, current]
y = Y[current, current]

#make a copy of the array & make the diagonal elements large (so they're not found)
search = search.copy() + 1e20*numpy.eye(search.shape)

#find the minimum value
m = search.min()

#find all minima
mask = search == m

#get the coordinates
xs = x[mask]
ys = y[mask]

#decide which minimum to take (sorry I didn't understand you logic here)
....

hope this helps,
cheers,
David

----- Original Message ----
From: R. Padraic Springuel <R.Springuel at umit.maine.edu>
To: Scipy User Support <scipy-user at scipy.org>
Sent: Thu, 3 June, 2010 6:08:54 AM
Subject: [SciPy-User] Speeding up a search algorithm

I've got a search algorithm I've written which is designed to find the 
minimum value of a rank 2 ndarray (called search) and remember it 
(calling it m) and where that value was located (with the values of n1 
and n2).  While that may seem a simple enough task using min and argmin 
functions, there are a few of caveats that make using those functions 
impractical (at least they appear impractical to me):

1) The array being searched is an extract from a larger array (made 
using take twice on the larger array with the rows and columns being 
taken specified by a list of indecies called current).  What we want to 
know is the location of the minimum in the original array, not in the 
extract, so the values for n1 and n2 have to be taken from current.

2) The array is distance array.  I.e. it is symmetric about the main 
diagonal and all the main diagonal elements are 0.  However, these main 
diagonal elements should not count as being the minimum value of the array.

3) The minimum value does not necessarily occur only once in the array 
(even when the symmetry is taken into account) and the way to choose 
between multiple minimums can vary.  Which variation is used is 
specified by the value of aggr (None, True, or False).

As it stands, the algorithm looks like this:

search = distancematrix.take(current,axis=0).take(current,axis=1)
m = numpy.max(search)
n1 = current[0]
n2 = current[1]
if aggr:
     p1 = 0
else:
     p1 = N
for i in range(len(search)):
     for j in range(len(search)):
         if i == j:
             break
         else:
             if search[i][j] < m or numpy.isnan(m):
                 m = search[i][j]
                 n1 = current[i]
                 n2 = current[j]
                 if n1 < 0:
                     p1 = tree.pop[n1]
                 else:
                     p1 = 1
                 if n2 < 0:
                     p1 += tree.pop[n2]
                 else:
                     p1 += 1
             elif search[i][j] == m and aggr != None:
                 if current[i] < 0:
                     p2 = tree.pop[current[i]]
                 else:
                     p2 = 1
                 if current[j] < 0:
                     p2 += tree.pop[current[j]]
                 else:
                     p2 += 1
                 if p2 < p1 and not aggr:
                     n1 = current[i]
                     n2 = current[j]
                     p1 = p2
                 elif p2 > p1 and aggr:
                     n1 = current[i]
                     n2 = current[j]
                     p1 = p2

However, I've found it to be fairly slow, especially on large arrays, 
when compared to min and argmin (probably due to all the looping in 
python).  Does anyone have any suggestions for optimizing this function 
or otherwise speeding it up?
-- 

R. Padraic Springuel
Research Assistant
Department of Physics and Astronomy
University of Maine
Bennett 309
Office Hours: By Appointment Only
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user