Comparing a lot of data efficiently

Marcus Stojek stojek at part-gmbh.de
Mon Aug 27 05:34:23 EDT 2001


Hi,
first let me say that I'm new to Python and that I really love it. 
Finally I can spend my time solving my problems and not fighting with the 
programming language.

Okay, here is what I do. I have two files (ASCII), each of them 
containing a huge amount (let's say 100.000) triangles. The format is:

Number of triangle,number of first node, number of second node , number 
of third node \n

node are corners of the triangle here. Each node has three coordinates in 
space, but we don't need them here.

The two files are describing exactly the same group of triangles, but the 
triangles are numbered differently in the two files. So the triangle 
containing the nodes (12,49583,331) has the number 5 in one file and the 
number 23094 in the other one. As two triangles can share two nodes I 
have to compare all three nodes for all triangles.

I have two dictonaries:

trione={number triangel:1.node,2.node,3.node}
(key and values are integer)
tritwo=as trione

def common(list1,list2):
    hit=0
    for i in list1:
        for j in list2:
            if i==j:
                hit +=1
    return hit

for t1 in trione.keys():
    for t2 in tritwo.keys():
        if common(trione[t1],tritwo[t2])>=3:
            out.write("%i;%i \n" % (t1,t2))



Works fine but it takes years. I tried to remove from the dictionaries 
the triangles that already have been identified to reduce data.
(using pop or slicing) but this slowed the process even more. I know I 
could speed up things by breaking the loops in common() when one node 
from list1 is not in list2, but for future use I have to deal with shapes 
containing more than three nodes.

Any idea how I could significantly improve perfomance? In "Programming 
Python" I learned about tuple stacks, binary search trees and graphs but 
I'm not sure whether one of those will help. Do I really have to creep 
back to C? If so, can anyone tell me how to get my dictonaries into a C 
routine? (I know I have to use SWIG, but a little example would help a 
lot.)

Thanks in advance.

Marcus




More information about the Python-list mailing list