Recursive generator for combinations of a multiset?

John O'Hagan research at johnohagan.com
Sat Nov 23 21:57:34 EST 2013


On Fri, 22 Nov 2013 22:33:29 -0800
Dan Stromberg <drsalists at gmail.com> wrote:

> On Fri, Nov 22, 2013 at 4:58 PM, John O'Hagan
> <research at johnohagan.com>wrote:
> 
> > On Thu, 21 Nov 2013 12:59:26 -0800
> > Dan Stromberg <drsalists at gmail.com> wrote:
> >
> > > On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan
> > > <research at johnohagan.com>wrote:
> > >
> > > >
> > > > Short story: the subject says it all, so if you have an answer
> > > > already, fire away. Below is the long story of what I'm using it
> > > > for, and why I think it needs to be recursive. It may even be of
> > > > more general interest in terms of filtering the results of
> > > > generators.
> > > >
> > >
> > > I think you probably need permutations rather than combinations.
> > >
> > > Also, I think you'll need to form a word (partitioned off by
> > > spaces), and then check it against a set
> > > containing /usr/share/dict/words before recursing for the
> > > remainder of the sentence - this should speed things up a LOT.
> >
> > Thanks for the reply. If I understand you correctly, you are
> > suggesting permuting the input _characters_ to form words and then
> > seeing if they exist, as opposed to my approach of combining known
> > words and seeing if they are anagrams. (Permutations of words would
> > not help find anagrams as they merely change the word order). Here
> > is an attempt at that:
> 
> 
> You've interpreted me correctly.
> 
> However, I was thinking about this in the back of my mind, and
> decided it would probably be best to inhale /usr/share/dict/words (if
> on Linux), and pull out words of the corrects lengths (as separated
> by the blanks) over the correct (possible) alphabet, and permute
> Those, afterward checking if they form good anagrams of the original
> sentence.  This would probably be much faster, since English isn't
> that dense of a space.

If you look back at my original question, you'll see that's pretty much
what I've done. I didn't spell out all the details, but  I made a
dictionary of wordlength keys containing lists of all dictionary words
of that length made of the correct sub-alphabet.

But to to recap the problem: to produce non-redundant anagram phrases, I
need the cartesian product (not permutations) of these lists if each is
unique, but with a subroutine producing multiset combinations if a list
is repeated, i.e., if a particular length is called for more than once.
The version I have so far is correct but substantially slower than the
product-only one, which just goes ahead and produces all the redundant
results. This seems counter-intuitive, and my theory is that this is
because I am unable to "prune" the non-recursive combination algorithm
I currently have.

Regards,

--

John



More information about the Python-list mailing list