Recursive generator for combinations of a multiset?

Fri Nov 22 19:58:38 EST 2013

On Thu, 21 Nov 2013 12:59:26 -0800
Dan Stromberg <drsalists at gmail.com> wrote:

> On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan
> <research at johnohagan.com>wrote:
> 
> >
> > Short story: the subject says it all, so if you have an answer
> > already, fire away. Below is the long story of what I'm using it
> > for, and why I think it needs to be recursive. It may even be of
> > more general interest in terms of filtering the results of
> > generators.
> >
> 
> I think you probably need permutations rather than combinations.
> 
> Also, I think you'll need to form a word (partitioned off by spaces),
> and then check it against a set containing /usr/share/dict/words
> before recursing for the remainder of the sentence - this should
> speed things up a LOT.

Thanks for the reply. If I understand you correctly, you are suggesting
permuting the input _characters_ to form words and then seeing if
they exist, as opposed to my approach of combining known words and
seeing if they are anagrams. (Permutations of words would not help find
anagrams as they merely change the word order). Here is an attempt at
that:

def anagrams(partition, input_string):
    """Find anagrams which fit given partition of input string length"""
    if not partition:
        yield (), input_string
        return
    for words, checkstring in anagrams(partition[:-1], input_string):
        for word in itertools.permutations(checkstring, partition[-1]):
            word = ''.join(word)
            if word in WORDS: #WORDS is collection of dictionary words
                newstring = checkstring
                for l in word:
                    newstring = newstring.replace(l, '' , 1)
                yield words + (word,), newstring

There are two problems with this. If there are repeated characters in
the input, redundant results are produced; a multiset-permutation
algorithm would fix this. But the main problem is it is incredibly
slow: on my run-of-the-mill laptop, it chokes on anything longer than
about 10 characters, spending most of its time rejecting non-words.

Or have I misunderstood your suggestion?

Regards,

--

John