exercise: partition a list by equivalence

Fri Feb 18 21:39:44 EST 2005

In article <1108768870.455974.244590 at c13g2000cwb.googlegroups.com>,
 "John Machin" <sjmachin at lexicon.net> wrote:

> You don't need to think. This problem has been extensively picked over
> by folk who are a lot smarter than us, starting from 30 years ago.
> Google for "disjoint set" and "union-find". One gets the impression
> that the best possible algorithm would be slightly *worse* than O(n).

It can be solved by union-find
(e.g. with UnionFind from <http://www.ics.uci.edu/~eppstein/PADS/>):

import UnionFind
import sets

def merge(pairs):
    uf = UnionFind.UnionFind()
    items = sets.Set()
    for a,b in pairs:
        uf.union(a,b)
        items.add(a)
        items.add(b)
    leaders = {}
    for a in items:
        leaders.setdefault(uf[a],sets.Set()).add(a)
    return [list(component) for component in leaders.values()]

If you do all the unions before all the finds, as in this algorithm, 
union-find is O(n).

However it might be easier to treat the input pairs as the edges of a 
graph and apply a depth-first-search based graph connected components 
algorithm.

-- 
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/