list of unique non-subset sets

Fri Mar 18 09:35:07 EST 2005

Looking at all the hyperedges in the connected component is a big
waste... You can look at just the hyperedges that share one or more
nodes.
(Nodes are the original letters contained in the sets, and they must be
hashable).

If nodes aren't integers in [0, len(l)) then you can use this simpler
code:

. l = [set(['a','b','c']), set(['a','c']), set(['a','d','e','1']),
.      set(['r','k','l']), set(['b','e','1']), set(['a','c']) ]
. from graph import Graph
. # http://www.fantascienza.net/animalia/graph.zip
. # you can something similar with Boost
.
. debug = True
. g = Graph()
. for n1,s in enumerate(l):
.     g.addNode(n1, nodeData=1)
.     for n2 in s:
.         g.addBiArc(n1, n2)
. if debug: print g # ****
.
. result = []
. for i,s in enumerate(l):
.     ss = set()
.     for n1 in s:
.         ss.update(g.xoutNodes(n1))
.     ss.remove(i)
.     if debug: print i, ss # ****
.
.     for sj in ss:
.         if s <= l[sj]:
.             break
.     else:
.         result.append(s)
. print result

If nodes are hashable, but they can cointain integers in [0, len(l)),
then you can use this other code with nodeID translations (in a
different graph implementation, like Gato one, such translations can be
automatic):

. l = [set(['a','b','c']), set(['a','c']), set(['a','d','e','1']),
.      set(['r','k','l']), set(['b','e','1']), set(['a','c']) ]
. from graph import Graph
.
. debug = True
. nodes = set()
. for s in l:
.     nodes.update(s)
. lenl = len(l)
. nodes = dict( (i+lenl, n) for i,n in enumerate(nodes) )
. if debug: print "nodes:", nodes # ****
. nodesid = dict( (n,i) for i,n in nodes.iteritems() )
. l2 = [ set( nodesid[n] for n in s ) for s in l ]
. if debug: print "l2:", l2 # ****
.
. g = Graph()
. for n1,s in enumerate(l2):
.     g.addNode(n1)
.     for n2 in s:
.         g.addBiArc(n1, n2)
. if debug: print g # ****
.
. result = []
. for i,s in enumerate(l2):
.     ss = set()
.     for n1 in s:
.         ss.update(g.xoutNodes(n1))
.     ss.remove(i)
.     if debug: print "i, ss:", i, ss # ****
.
.     for sj in ss:
.         if s <= l2[sj]:
.             break
.     else:
.         result.append(s)
. if debug: print "result:", result # ****
. result2 = [ set( nodes[n] for n in s ) for s in result ]
. print result2

If the hyperedges define a sparse hypergraph, then this code can be
quite faster than the original quadratic one (if len(l) is big enough).

Bye,
Bearophile