improving a huge double-for cycle

Thu Sep 18 08:45:34 EDT 2008

Alexzive wrote:

> Hello there :) ,
> 
> I am a python newbie and need to run following code for a task in an
> external simulation programm called "Abaqus" which makes use of python
> to access the mesh (ensamble of nodes with xy coordinates) of a
> certain geometrical model.
> 
> [IN is the starting input containing the nodes to be check, there are
> some double nodes with the same x and y coordinates which need to be
> removed. SN is the output containing such double nodes]
> 
> Code: Select all
>     for i in range(len(IN)): #scan all elements of the list IN
>       for j in range(len(IN)):
>         if i <> j:
>          if IN[i].coordinates[0] == IN[j].coordinates[0]:
>            if IN[i].coordinates[1] == IN[j].coordinates[1]:
>               SN.append(IN[i].label)
> 
> 
> 
> Unfortunately my len(IN) is about 100.000 and the running time about
> 15h !!!! :(

> Any idea to improve it?
> 
> I have already tried to group the "if statements" in a single one:
> 
> Code: Select all
>     if i <> j and if IN[i].coordinates[0] == IN[j].coordinates[0] and
> if IN[i].coordinates[1] == IN[j].coordinates[1]:
> 
> 
> but no improvements.
> 
> Many thanks, Alex

When you're looking for duplicates an efficient solution is likely to be
based on a set or dict object.

# untested
from collections import defaultdict

groups = defaultdict(list)
for item in IN:
    c = item.coordinates
    groups[c[0], c[1]].append(item.label)
SN = []
for labels in groups.itervalues():
    if len(labels) > 1:
        SN.extend(labels) # or labels[1:] if you want to keep one item

Peter