help!: optimization of search problem (example code)

Thu Feb 14 18:45:10 EST 2002

Sorry I can't just spit out some python code, but you might want to
investigate some space partitioning trees.  The three major types are
bsp-trees quadtrees and octrees.  Inserting points will take longer than
building a list, but finding closest points should be significantly faster.

It's definately the way you want to go if your non-test data is dynamic.
--

"Marcus Stojek" <stojek at part-gmbh.de> wrote in message
news:3c6bcf70.16509609 at news.easynews.net...
> Hi,
>
> after some days of playing around I am not getting
> any further and have to ask for some more help.
>
> Here is my problem. I have two large lists ("la" and "lb") of tuples.
> Each tuple represents a point in space and has four entries:
> x, y, z, pointID
>
> Now I have to find out for each point of list "la" which point of
> "lb" is closest. The result should be for all points of "la":
>
> Point 123 of "lb" is the closest one to Point 5932 of "la"
> where 123 and 5932 is the pointID.
>
> So here is what I have so far.:
>
> ----------------------------------------------------------------
> from random import *
> from math import *
> import bisect
> import time
>
>
> # timer for benchmarking
> def startclock():
>     d=time.clock()
>     return d
>
> def stopwatch(start):
>     return time.clock()-start
>
> #generating two arbtitrary lists
> seed(0)
> a={}
> b={}
>
> for i in range(20000):
>     a[i]=(100*random(),100*random(),100*random())
>     b[i]=(100*random(),100*random(),100*random())
>
> la=[]
> lb=[]
>
> for i in range (20000):
>     dummy=(a[i][0],a[i][1],a[i][2],i)
>     la.append(dummy)
>     dummy=(b[i][0],b[i][1],b[i][2],i)
>     lb.append(dummy)
>
> lb.sort() #source list
>
> start=startclock()
> maxdist=5.0 # I know that there are no two points with a distance > 5
>
> for i in range(500): #should be range(len(la))
> #slicing a part out of lb that is closer than maxdist for the first
> #coordinate
>     lp=(la[i][0]-maxdist,la[i][1],la[i][2],la[i][3])
>     rp=(la[i][0]+maxdist,la[i][1],la[i][2],la[i][3])
>     left=bisect.bisect_right(lb,lp)
>     right=bisect.bisect_left(lb,rp)
>     if right==len(lb): right-=1
>     lsearch=lb[left:right] # only compare la[i] to this list
>
>     mindist=10000
>     minhit=-1 #pointID of closest point
>
>     for j in lsearch:
> # i can exclude points with coordinate 2 or 3 further away than
> #maxdist
>         if abs(j[1]-la[i][1])< maxdist and
>            abs(j[2]-la[i][2]) < maxdist:
> #calculate distance:
>
>
d=sqrt(pow((j[0]-la[i][0]),2)+pow((j[1]-la[i][1]),2)+pow((j[2]-la[i][2]),2))
>             if d < mindist:
>                 mindist=d
>                 minhit=j
>
>     if i <= 10 :
>    print la[i],minhit,mindist
>
> bench=stopwatch(start)
> print bench, " sec"
> --------------------------------------------------------------------------
------------------------------
>
> This takes 5.3 sec on my machine for 500 points. That is much too
> long. Any idea how I could significanty improve this code?
>
> Tanks,
> Marcus