help!: optimization of search problem (example code)

Thu Feb 14 16:58:50 EST 2002

stojek at part-gmbh.de (Marcus Stojek) wrote in message news:<3c6bcf70.16509609 at news.easynews.net>...
> Hi,
> 
> after some days of playing around I am not getting
> any further and have to ask for some more help.
> 
> Here is my problem. I have two large lists ("la" and "lb") of tuples.
> Each tuple represents a point in space and has four entries:
> x, y, z, pointID
> 
> Now I have to find out for each point of list "la" which point of
> "lb" is closest.
[snip]
> for i in range(20000):
>     a[i]=(100*random(),100*random(),100*random())
>     b[i]=(100*random(),100*random(),100*random())
[snip]
> maxdist=5.0 # I know that there are no two points with a distance > 5

The source of your knowledge is not obvious. Your random test data
(100 * random() for each coordinate) violates this assertion. You may
like to contemplate how you could write your code with no prior
knowledge of a presumed maximum distance, and no crutches like
sys.maxfloat.

> 
> for i in range(500): #should be range(len(la))
> #slicing a part out of lb that is closer than maxdist for the first
> #coordinate
>     lp=(la[i][0]-maxdist,la[i][1],la[i][2],la[i][3])
>     rp=(la[i][0]+maxdist,la[i][1],la[i][2],la[i][3])
>     left=bisect.bisect_right(lb,lp)
>     right=bisect.bisect_left(lb,rp)
>     if right==len(lb): right-=1

The above line strikes me as being very suspicious. It would seem to
be valid only as compensation for irregular behaviour by
bisect.bisect_left, which I find hard to believe. I suggest that you
check your boundary conditions very carefully. If the above line is
necessary, add an explanatory comment for the benefit of future
maintenance programmers :-)

>     lsearch=lb[left:right] # only compare la[i] to this list

Firstly, this excludes the lb[right] element; are you sure that's what
you want?

Secondly, this copies a slice of the lb list. As you are not mutating
the slice, there is no need to copy it. Try changing the inner for
loop to

for lb_index in range(left, right):
    j = lb[lb_index]

This may be a win or a loss, depending on the size of the slice.

Note the above is suggested as a formal equivalent of your code. You
really should not be using the name "j" like this ... i is an index of
la, but j is an element of lb! Take pity on those future maintenance
programmers and be consistent.

> 
>     mindist=10000

What happened to 5.0 ?????

[snip] 
> This takes 5.3 sec on my machine for 500 points. That is much too
> long. Any idea how I could significanty improve this code?

1. Avoid the sqrt(). If x and y are floats and x >= 0 and y >= 0 and x
< y, then you know that sqrt(x) < sqrt(y) without having to calculate
the sqrts.

2. Evaluate (for yourself, and please publish the results) which of
these is fastest in Python for squaring a float: (a) pow(x, 2) (b) x
** 2 (c) x * x

3. Hoist some of your loop-invariant expressions out of your innermost
loop:

lai0 = la[i][0]
lai1 = la[i][1]
lai2 = la[i][2]
for j in lsearch:
    if abs(j[1]-lai1) < maxdist \
    and abs(j[2]-lai2) < maxdist:
        d=sqrt(pow((j[0]-lai0),2)+pow((j[1]-lai1),2)+pow((j[2]-lai2),2))

4. Look for a better algorithm. You wouldn't be the first to have to
solve this problem.

5. Recode your best Python algorithm in C.