Recursive generator for combinations of a multiset?

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Nov 25 07:15:15 EST 2013


On 21 November 2013 13:01, John O'Hagan <research at johnohagan.com> wrote:
> In my use-case the first argument to multicombs is a tuple of words
> which may contain duplicates, and it produces all unique combinations
> of a certain length of those words, eg:
>
> list(multicombs(('cat', 'hat', 'in', 'the', 'the'), 3))
>
> [('cat', 'hat', 'in'), ('cat', 'hat', 'the'), ('cat', 'in', 'the'),
> ('cat', 'the', 'the'), ('hat', 'in', 'the'), ('hat', 'the', 'the'),
> ('in', 'the', 'the')]

I still don't understand what you're actually doing well enough to
know whether there is a better general approach to the problem. For
the specific thing you requested, here is a recursive multiset
combinations generator. Does it do what you wanted?

#!/usr/bin/env python

def multicombs(it, r):
    words = []
    last = None
    for N, word in enumerate(it):
        if word == last:
            words[-1][1] += 1
        else:
            words.append([word, 1])
            last = word
    cumulative = 0
    for n in range(len(words)-1, -1, -1):
        words[n].append(cumulative)
        cumulative += words[n][1]
    return _multicombs((), words, r)

def _multicombs(prepend, words, r):
    if r == 0:
        yield prepend
        return
    (word, count, rem), *remaining = words
    for k in range(max(r-rem, 0), min(count, r) + 1):
        yield from _multicombs(prepend + (word,) * k, remaining, r-k)

expected = [
       ('cat', 'hat', 'in'),
       ('cat', 'hat', 'the'),
       ('cat', 'in', 'the'),
       ('cat', 'the', 'the'),
       ('hat', 'in', 'the'),
       ('hat', 'the', 'the'),
       ('in', 'the', 'the'),
    ]

output = list(multicombs(('cat', 'hat', 'in', 'the', 'the'), 3))

assert len(expected) == len(output)
assert set(expected) == set(output)  # The order is different


Oscar



More information about the Python-list mailing list