Recursive generator for combinations of a multiset?

MRAB python at mrabarnett.plus.com
Fri Nov 22 23:23:42 EST 2013


On 23/11/2013 00:58, John O'Hagan wrote:
> On Thu, 21 Nov 2013 12:59:26 -0800
> Dan Stromberg <drsalists at gmail.com> wrote:
>
>> On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan
>> <research at johnohagan.com>wrote:
>>
>> >
>> > Short story: the subject says it all, so if you have an answer
>> > already, fire away. Below is the long story of what I'm using it
>> > for, and why I think it needs to be recursive. It may even be of
>> > more general interest in terms of filtering the results of
>> > generators.
>> >
>>
>> I think you probably need permutations rather than combinations.
>>
>> Also, I think you'll need to form a word (partitioned off by spaces),
>> and then check it against a set containing /usr/share/dict/words
>> before recursing for the remainder of the sentence - this should
>> speed things up a LOT.
>
> Thanks for the reply. If I understand you correctly, you are suggesting
> permuting the input _characters_ to form words and then seeing if
> they exist, as opposed to my approach of combining known words and
> seeing if they are anagrams. (Permutations of words would not help find
> anagrams as they merely change the word order). Here is an attempt at
> that:
>
> def anagrams(partition, input_string):
>      """Find anagrams which fit given partition of input string length"""
>      if not partition:
>          yield (), input_string
>          return
>      for words, checkstring in anagrams(partition[:-1], input_string):
>          for word in itertools.permutations(checkstring, partition[-1]):
>              word = ''.join(word)
>              if word in WORDS: #WORDS is collection of dictionary words
>                  newstring = checkstring
>                  for l in word:
>                      newstring = newstring.replace(l, '' , 1)
>                  yield words + (word,), newstring
>
> There are two problems with this. If there are repeated characters in
> the input, redundant results are produced; a multiset-permutation
> algorithm would fix this. But the main problem is it is incredibly
> slow: on my run-of-the-mill laptop, it chokes on anything longer than
> about 10 characters, spending most of its time rejecting non-words.
>
> Or have I misunderstood your suggestion?
>
If you want to know how to get unique permutations, have a look here:

http://mail.python.org/pipermail/python-ideas/2013-October/023610.html




More information about the Python-list mailing list