[SciPy-User] Speeding up a search algorithm
David Baddeley
david_baddeley at yahoo.com.au
Wed Jun 2 18:44:35 EDT 2010
How about the following strategy:
#set up arrays of indices
X, Y = numpy.ogrid[:distancematrix.shape[0], :distancematrix.shape[1]
#pull out the relevant chunk
search = distancematrix[current, current]
x = X[current, current]
y = Y[current, current]
#make a copy of the array & make the diagonal elements large (so they're not found)
search = search.copy() + 1e20*numpy.eye(search.shape)
#find the minimum value
m = search.min()
#find all minima
mask = search == m
#get the coordinates
xs = x[mask]
ys = y[mask]
#decide which minimum to take (sorry I didn't understand you logic here)
....
hope this helps,
cheers,
David
----- Original Message ----
From: R. Padraic Springuel <R.Springuel at umit.maine.edu>
To: Scipy User Support <scipy-user at scipy.org>
Sent: Thu, 3 June, 2010 6:08:54 AM
Subject: [SciPy-User] Speeding up a search algorithm
I've got a search algorithm I've written which is designed to find the
minimum value of a rank 2 ndarray (called search) and remember it
(calling it m) and where that value was located (with the values of n1
and n2). While that may seem a simple enough task using min and argmin
functions, there are a few of caveats that make using those functions
impractical (at least they appear impractical to me):
1) The array being searched is an extract from a larger array (made
using take twice on the larger array with the rows and columns being
taken specified by a list of indecies called current). What we want to
know is the location of the minimum in the original array, not in the
extract, so the values for n1 and n2 have to be taken from current.
2) The array is distance array. I.e. it is symmetric about the main
diagonal and all the main diagonal elements are 0. However, these main
diagonal elements should not count as being the minimum value of the array.
3) The minimum value does not necessarily occur only once in the array
(even when the symmetry is taken into account) and the way to choose
between multiple minimums can vary. Which variation is used is
specified by the value of aggr (None, True, or False).
As it stands, the algorithm looks like this:
search = distancematrix.take(current,axis=0).take(current,axis=1)
m = numpy.max(search)
n1 = current[0]
n2 = current[1]
if aggr:
p1 = 0
else:
p1 = N
for i in range(len(search)):
for j in range(len(search)):
if i == j:
break
else:
if search[i][j] < m or numpy.isnan(m):
m = search[i][j]
n1 = current[i]
n2 = current[j]
if n1 < 0:
p1 = tree.pop[n1]
else:
p1 = 1
if n2 < 0:
p1 += tree.pop[n2]
else:
p1 += 1
elif search[i][j] == m and aggr != None:
if current[i] < 0:
p2 = tree.pop[current[i]]
else:
p2 = 1
if current[j] < 0:
p2 += tree.pop[current[j]]
else:
p2 += 1
if p2 < p1 and not aggr:
n1 = current[i]
n2 = current[j]
p1 = p2
elif p2 > p1 and aggr:
n1 = current[i]
n2 = current[j]
p1 = p2
However, I've found it to be fairly slow, especially on large arrays,
when compared to min and argmin (probably due to all the looping in
python). Does anyone have any suggestions for optimizing this function
or otherwise speeding it up?
--
R. Padraic Springuel
Research Assistant
Department of Physics and Astronomy
University of Maine
Bennett 309
Office Hours: By Appointment Only
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list