exercise: partition a list by equivalence

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sat Feb 19 20:56:27 EST 2005


John Machin>FWIW, here's a brief UAT report:

Here is something else for you.
Note: for more correct comparisons, for the following tests I've
disabled the use of Psyco in Graph (usually enabled, if present).
This looks faster than merge2 for this (quite sparse) random pairs
test:

. import graph # http://www.fantascienza.net/animalia/graph.zip
. def merge4(l):
.     g = graph.Graph()
.     for n1,n2 in l:
.         g.addArc(n1,n2)
.     return g.connectedComponents()

Creating a better addArcs produces good results (addArcs method is
present in Graph, but addArcs2 isn't present):

. def merge5(l):
.     g = graph.Graph()
.     g.addArcs2(l)
.     return g.connectedComponents()


If you like to see the actual code, without using Graph, I've made a
very stripped down version of addArcs2 and connectedComponents:

. from collections import deque
. from itertools import chain
.
. def merge6(l):
.     # graph creation
.     gi = {}
.     go = {}
.     for n1,n2 in l:
.         if n1 not in go:
.             go[n1] = {}
.             gi[n1] = {}
.         if n2 not in go:
.             go[n2] = {}
.             gi[n2] = {}
.         go[n1][n2] = 0
.         gi[n2][n1] = 0
.     # Connected components:
.     result = []
.     visited = set()
.     for root in go.iterkeys():
.         if root not in visited:
.             component = [root]
.             visited.add(root)
.             Q = deque() # A queue
.             Q.append(root)
.             while Q:
.                 n1 = Q.popleft()
.                 for n2 in chain( go[n1].iterkeys(), gi[n1].iterkeys()
):
.                     if n2 not in visited:
.                         visited.add(n2)
.                         Q.append(n2)
.                         component.append(n2)
.             result.append(component)
.     return result


At the end of the tests I've added this to be sure the results are
always the same:

r2 = set( frozenset(e) for e in merge2(l) )
r4 = set( frozenset(e) for e in merge4(l) )
r5 = set( frozenset(e) for e in merge5(l) )
r6 = set( frozenset(e) for e in merge6(l) )
assert r2 == r4
assert r2 == r6

For xMax, nPairs = 1000, 5000:
merge2: 15750 function calls in 0.939 CPU seconds
merge4: 11029 function calls in 0.560 CPU seconds
merge5: 6030 function calls in 0.303 CPU seconds
merge6: 6011 function calls in 0.297 CPU seconds

For xMax, nPairs = 2000, 10000:
merge2: 31503 function calls in 2.505 CPU seconds
merge6: 12011 function calls in 0.639 CPU seconds

Timings using clock() (much more reliable than the profilers!), for
xMax, nPairs = 2000, 10000:
merge2: 1.222
merge6: 0.201

Probably merge6 can be improved, reducing memory used...

Bear hugs,
Bearophile




More information about the Python-list mailing list