Algorithm for Creating Supersets of Smaller Sets Based on Common Elements

duncan smith buzzard at invalid.invalid
Sun Feb 22 12:02:10 EST 2015


On 21/02/15 19:46, TommyVee wrote:
> Start off with sets of elements as follows:
> 
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
> 
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore
> they are "related" and form the following superset:
> 
> A,B,C,D,E,F,X,Z
> 
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6
> have element 'M' in common, therefore they form the following superset:
> 
> G,H,L,M,O,P,Q,R,Y
> 
> I think you get the point.  As long as sets have at least 1 common
> element, they combine to form a superset.  Also "links" (common
> elements) between sets may go down multiple levels, as described in the
> second case above (2->5->6).  Cycles thankfully, are not possible.
> 
> BTW, the number of individual sets (and resultant supersets) will be
> very large.
> 
> I don't know where to start with this.  I thought about some type of
> recursive algorithm, but I'm not sure.  I could figure out the Python
> implementation easy enough, I'm just stumped on the algorithm itself.
> 
> Anybody have an idea?
> 
> Thanks, Tom

Tom,
    You could construct a graph G such that there exists an edge {v,w}
for each pair of items that appear in a common set. Each connected
component will contain the nodes of one of the supersets you're looking
for. That's unnecessarily expensive, so you can adopt a similar approach
using trees.

http://en.wikipedia.org/wiki/Disjoint-set_data_structure

Construct a forest of trees (initially one tree for each item) and join
pairs of trees until you have one tree for each of your supersets. For
each set of size n you only need to consider n-1 joins. That will ensure
that any pair of items that are in one of the sets are contained in a
single tree. The find and union operations are amortized constant, so
this should be efficient for your large numbers of sets.

The union_find module can be found at
https://github.com/DuncanSmith147/KVMS. Cheers.

Duncan


>>> import union_find
>>> sets = (['A','B','E','F'],
['G','H','L','P','Q'],
['C','D','E','F'],
['E','X','Z'],
['L','M','R'],
['O','M','Y'])
>>> uf = union_find.UnionFindTree()
>>> for a_set in sets:
	for item in a_set:
		try:
			uf.add(item)
		except ValueError:
			pass
	n = a_set[0]
	for item in a_set[1:]:
		is_merged = uf.union(n, item)

		
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for item in uf:
	res[uf.find(item)].append(item)

	
>>> res.values()
[['G', 'H', 'M', 'L', 'O', 'Q', 'P', 'R', 'Y'], ['A', 'C', 'B', 'E',
'D', 'F', 'X', 'Z']]
>>>





More information about the Python-list mailing list