improving a huge double-for cycle

Tim Chase python.list at tim.thechases.com
Thu Sep 18 08:57:16 EDT 2008


> Code: Select all
>     for i in range(len(IN)): #scan all elements of the list IN
>       for j in range(len(IN)):
>         if i <> j:
>          if IN[i].coordinates[0] == IN[j].coordinates[0]:
>            if IN[i].coordinates[1] == IN[j].coordinates[1]:
>               SN.append(IN[i].label)
> 
> 
> Unfortunately my len(IN) is about 100.000 and the running time about
> 15h !!!! :(
> 
> Any idea to improve it?
[snip]
> I have already tried to group the "if statements" in a single one:
> 
> Code: Select all
>     if i <> j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
> if IN[i].coordinates[1] == IN[j].coordinates[1]:
> 
> but no improvements.

It's like rearranging deck-chairs on the Titanic :)  Yes, it may 
give a speed up, but what's 3 seconds when you're waiting 15hr :)

Not knowing the len(IN[x].coordinates) or their structure, if 
it's a list of len==2, you should be able to just do

   if i <> j and IN[i].coordinates == IN[j].coordinates

or

   if i <> j and IN[i].coordinates[:2] == IN[j].coordinates[:2]

However, again, this is just polish.  The big problem is that you 
have an O(N^2) algorithm that's killing you.

1) use xrange instead of range to save eating memory with a huge 
unneeded array.

2) unless you need to append duplicate labels, you know that when 
I and J are swapped, you'll reach the same condition again, so it 
might be worth writing the outer loops to eliminate this 
scenario, and in the process, but just starting at i+1 rather 
than i, you can forgo the check if "i<>j".

Such changes might look something like

   for i in xrange(len(IN)):
     for j in xrange(i+1, len(IN)):
       if IN[i].coordinates == IN[j].coordinates:
         SN.append(IN[i].label)

If my college algorithms memory serves me sufficiently, this 
reduces your O(N^2) to O(N log N) which will garner you some 
decent time savings.

-tkc





More information about the Python-list mailing list