[Edu-sig] Algorithm help Subsets

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Sat, 2 Mar 2002 13:41:52 -0800 (PST)


On Sat, 2 Mar 2002, Arthur Siegel wrote:

> I haven't been able to come up with a generalization - an alogrithm with a
> second parameter as the number of elements I want in the returned subset.
> 
> I am sure these are well-worn issues - but I haven't found the right
> resource for answers.

Hi Arthur,

We can solve this problem if we look at it recursively.  Let's say that
unique2() and unique3() are special cases of a generalized subsets()
function.  This subsets() function should take as arguments the sequence
we're choosing from, and the number of elements we'd like for each subset.


For example, if we had the sequence:

    sequence = ['a', 'b', 'c']

then subsets(sequence, 1) should give us:

    [['a'], ['b'], ['c']]

and subsets(sequence, 2) should give us:

    [['a', 'b'], ['a', 'c'], ['b', 'c']]



If we look at our original sequence:
                       
    ['a', 'b', 'c']

and all its subsets of length 2:

    [('a', 'b'), ('a', 'c'), ('b', 'c')]

we can split the subsets down into two distinct categories: the subsets
that contain the first letter 'a':

    [('a', 'b'), ('a', 'c')]

and the ones that don't contain the first letter:

    [('b', 'c')]

The first case turns out to be subsets(['b', 'c'], 1), with a bunch of
"a"'s appended to the front of each sub-subset.  The second case is
subsets(['b', 'c'], 2).


When we build those subsets, we can construct those two sub-subset
categories, and put them together.  As we build them, we have a choice of
either including the first element, or not including it. Here is a
recursive definition of the subsets function:

###
def subsets(sequence, n):
    """Returns all subsets of length n from the sequence."""
    if not sequence or n == 0:
        return []
    if n == 1: return map(lambda x: [x], sequence)
    case1 = addHeads(sequence[0], subsets(sequence[1:], n-1))
    case2 = subsets(sequence[1:], n)
    return case1 + case2


def addHeads(head, sequence):
    results = []
    for s in sequence:
        results.append([head] + s)
    return results
###



By the way, Donald Knuth has been writing Volume Four of his "Art of
Computer Programming", and he's put up the fascicles for his next two
chapters online:

    http://www-cs-faculty.stanford.edu/~knuth/fasc2a.ps.gz
    http://www-cs-faculty.stanford.edu/~knuth/fasc2b.ps.gz

Lots of good algorithm stuff to look at!  *grin*


Good luck to you!