Recursive generator for combinations of a multiset?

John O'Hagan research at johnohagan.com
Sat Nov 23 21:28:58 EST 2013


On Sat, 23 Nov 2013 04:23:42 +0000
MRAB <python at mrabarnett.plus.com> wrote:

> On 23/11/2013 00:58, John O'Hagan wrote:
> > On Thu, 21 Nov 2013 12:59:26 -0800
> > Dan Stromberg <drsalists at gmail.com> wrote:
> >
> >> On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan
> >> <research at johnohagan.com>wrote:
> >>
> >> >
> >> > Short story: the subject says it all, so if you have an answer
> >> > already, fire away. Below is the long story of what I'm using it
> >> > for, and why I think it needs to be recursive. It may even be of
> >> > more general interest in terms of filtering the results of
> >> > generators.
> >> >
> >>
> >> I think you probably need permutations rather than combinations.
> >>
> >> Also, I think you'll need to form a word (partitioned off by
> >> spaces), and then check it against a set
> >> containing /usr/share/dict/words before recursing for the
> >> remainder of the sentence - this should speed things up a LOT.
> >
> > Thanks for the reply. If I understand you correctly, you are
> > suggesting permuting the input _characters_ to form words and then
> > seeing if they exist, as opposed to my approach of combining known
> > words and seeing if they are anagrams. (Permutations of words would
> > not help find anagrams as they merely change the word order). Here
> > is an attempt at that:
> >
> > def anagrams(partition, input_string):
> >      """Find anagrams which fit given partition of input string
> > length""" if not partition:
> >          yield (), input_string
> >          return
> >      for words, checkstring in anagrams(partition[:-1],
> > input_string): for word in itertools.permutations(checkstring,
> > partition[-1]): word = ''.join(word)
> >              if word in WORDS: #WORDS is collection of dictionary
> > words newstring = checkstring
> >                  for l in word:
> >                      newstring = newstring.replace(l, '' , 1)
> >                  yield words + (word,), newstring
> >
> > There are two problems with this. If there are repeated characters
> > in the input, redundant results are produced; a multiset-permutation
> > algorithm would fix this. But the main problem is it is incredibly
> > slow: on my run-of-the-mill laptop, it chokes on anything longer
> > than about 10 characters, spending most of its time rejecting
> > non-words.
> >
> > Or have I misunderstood your suggestion?
> >
> If you want to know how to get unique permutations, have a look here:
> 
> http://mail.python.org/pipermail/python-ideas/2013-October/023610.html
> 

For this particular problem I don't need multiset permutations but
multiset combinations (see original post). But that thread was a good
read.

This is a little OT, but I did need multiset permutations a couple of
years back for a project involving the generation of musical
structures. There was zero interest here at the time (fair enough!) and
I ended up implementing the C++ function "next_permutation" in Python.

So it was good to read in that thread that there seems to be some
interest in incorporating multiset combinatorics into itertools
(including your excellent contribution). IMHO the scepticism there about
non-exotic use-cases was unjustified. Leaving aside my probably atypical
uses, it crops in many ordinary situations dealing with arrangements of
multiple items of several types where each instance of a type is
equivalent. Take stock control: when stacking a warehouse shelf it
doesn't matter which particular box goes where, only how many of each
size. Or timetabling: if Mrs Smith teaches the same students maths on
Tuesdays and Thursdays, swapping the classes does nothing. 

The same goes for cyclic and cyclic-multiset ("necklaces")
combinatorics, where the beginning and end of an arrangement is not
significant, eg. 24-hour rostering, laying tiles, etc. And musical
scales.

Disclaimer: I am far from expert on combinatorics but seem to end up
using it a lot.

--

John




More information about the Python-list mailing list