Recursive generator for combinations of a multiset?

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Nov 27 17:50:04 EST 2013


On 27 November 2013 10:25, John O'Hagan <research at johnohagan.com> wrote:
> On Tue, 26 Nov 2013 10:33:06 +0000
> Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>
> I simplified it a bit more to this:
>
> def word_sequences(prepend, target, subwords):
>     """subwords is a list of lists of subwords grouped by length,
>         in order of length"""
>     for words in subwords:
>         for word in words:
>             recurse_target = subset_subtract(target, word)
>             if recurse_target:
>                 yield from word_sequences(prepend + ' ' + word,
>                         recurse_target, subwords[:len(recurse_target)])
>             elif recurse_target == '':
>                 yield prepend + ' ' + word
>
> with just one function to do the subset testing:
>
> def subset_subtract(target, word):
>     for i in word:
>         if i in target:
>             target = target.replace(i, '' ,1)
>         else:
>             return
>     return target
>
> Speed-wise it is somewhat faster than any of my non-duplicate-producing
> attempts, but still way slower than the current champion, the redundant
> cartesian product-only version.
>
> However, because it iterates through all the remaining words on each
> recursion, it seems to produce n! of each unique result, where n in the
> number of words in the result, so this is the new champion as far as
> redundancy is concerned. I'll keep working on it, the totally
> different approach is interesting.

Whoops, I guess this is what happens when you send untested
(pseudo-)code out. It needs an outer helper function that can do
something like:

def word_sequences_top(target, subwords):
    for word in copy(subwords):
        recurse_target = multiset_subtrace(target,word)
        yield from word_sequences(words, recurse_target, subwords)
        remove_word_from_list(word, subwords)

This way we yield all matches involving the word once and then go on
to all matches that don't include the word.

Also the partition length logic from your original version can be used
in word_sequences to prune recursion branches.


Oscar



More information about the Python-list mailing list