Comparing a lot of data efficiently
Hans Nowak
hnowak at cuci.nl
Mon Aug 27 07:17:44 EDT 2001
>===== Original Message From Marcus Stojek <stojek at part-gmbh.de> =====
>
>Okay, here is what I do. I have two files (ASCII), each of them
>containing a huge amount (let's say 100.000) triangles. The format is:
>
>Number of triangle,number of first node, number of second node , number
>of third node \n
>
>node are corners of the triangle here. Each node has three coordinates in
>space, but we don't need them here.
>
>The two files are describing exactly the same group of triangles, but the
>triangles are numbered differently in the two files. So the triangle
>containing the nodes (12,49583,331) has the number 5 in one file and the
>number 23094 in the other one. As two triangles can share two nodes I
>have to compare all three nodes for all triangles.
>
>I have two dictonaries:
>
>trione={number triangel:1.node,2.node,3.node}
>(key and values are integer)
>tritwo=as trione
>
>def common(list1,list2):
> hit=0
> for i in list1:
> for j in list2:
> if i==j:
> hit +=1
> return hit
This implies that the nodes can appear in varying orders, e.g. (1, 2, 3) is
the same node as (3, 2, 1) etc. I don't know how much control you have over
the *creation* of the ASCII files, but if you can somehow make sure the orders
of the nodes in both files are the same, there's room for considerable
optimization:
def common(list1,list2):
hit=0
if list1 == list2:
hit = 3 # faking the original value
return hit
If you cannot change the files, I'm not sure if it's feasible to give them a
fixed order when reading the files into the dictionary, e.g. by sorting the
node part. I haven't tested this, so this may just move the delay to a
different place without resulting in any overall performance increase.
Aside from this, I haven't found many possibilities for optimization. I made a
test set and tried moving the tritwo.keys() out of the outer loop, and using
lists instead of dicts... this did not result in better performance. :( That
does not mean nothing can be done though; maybe someone else on this list has
a clever idea.
HTH,
--Hans Nowak
More information about the Python-list
mailing list