exercise: partition a list by equivalence

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sun Feb 20 06:31:33 EST 2005


John Machin:
>Perhaps I'm missing a posting of yours -- what are merge2 and merge4?
What is "this random pairs test"? What is xMax (nPairs isn't hard to
guess), and how are you generating your input data?<

merge2 and this random pairs test comes from the post by Brian Beck.
merge4 is the first in my last post. And merge3 isn't included (nor
tested) because it uses the original (slow) addArcs.
This is the simple test generator taken from his post:

xMax = 3000
nPairs = xMax*5
l = [[randint(0,xMax), randint(0,xMax)] for i in xrange(nPairs)]
For such high number of nPairs (arcs), the resulting graph usuall has
one big connected component, and few tiny debris... To produce a more
divided graph, nPairs have to be much smaller, like xMax*1.5.


>FWIW, my offering is attached. <

I'm sorry, I don't see the attach... (just the different title).


John Machin:
>Only David's uses stuff that's not in the Python 2.4 off-the-shelf
distribution.<

Well, using already mede functions and classes is good, it avoids you
to reinvent things each time. Some of my versions too use my graph
library (for a problem like this I usually don't create stand alone
versions like merge6 and merge7). And my original point was adding one
graph data structure to the standard Python library :-]
I haven't tested the speed of David Eppstein version...

Anyway, this is a simplified version of merge6, that uses an indirect
graph g, instead of the directed graph, and avoids the use of chain:

. from collections import deque
.
. def merge7(l):
.     # Undirect graph g creation:
.     g = {}
.     for n1,n2 in l:
.         if n1 not in g: g[n1] = {n2:0}
.         else: g[n1][n2] = 0
.         if n2 not in g: g[n2] = {n1:0}
.         else: g[n2][n1] = 0
.     # Connected components:
.     result = []
.     visited = set()
.     for root in g.iterkeys():
.         if root not in visited:
.             component = [root]
.             visited.add(root)
.             Q = deque() # A queue
.             Q.append(root)
.             while Q:
.                 n1 = Q.popleft()
.                 for n2 in g[n1].iterkeys():
.                     if n2 not in visited:
.                         visited.add(n2)
.                         Q.append(n2)
.                         component.append(n2)
.             result.append(component)
.     return result


It's a bit shorter and a little faster than merge6, memory used is
similar (there is only one dictionary, so there is only one
ammortization overhead of the dictionary, but it contains the sum of
both arcs, so the ammortization can be twice as big, so... memory used
can be the same).
(Usually BFS and DFS memory requirements are a little different; this
implementation uses BFS, and I think it uses a bit less memory than
DFS, but for this problem/test the difference isn't significative.)

Bear hugs,
Bearophile




More information about the Python-list mailing list