Generating combinations (was RE: automatic "lotjes trekken")

Tim Peters tim_one at email.msn.com
Thu Dec 9 21:54:59 EST 1999


[François Pinard asks whether there's a std helper available to generate
 permutations or combinations.
 Tim sez not that he knows of, but he has a bunch of inconsistent code for
 stuff "like that"; notes that full polymorphism costs more than working
 only with lists; and says he expects to clean up his code some day & make
 it available]

Which is a bigger project than I can make time for now, so here's a
cleaned-up version of just one combinatorial family (combinations).  I'm
only going to post the module docstring here; mail me if you want a beta and
I'll mail back the whole module (about 20Kb).  Once I'm happy with the
design, I'll upload it somewhere.  It will be in the public domain.

The main ideas so far are:

1) There's a tension between fast-but-limited lists of "canonical ints",
   and routines that work with any kind of sequence.  This is resolved
   here by supplying both:  class CombGenBasic for the former, and
   CombGen for the latter.

2) Lexicographic order, Gray code order, and random selection are all
   important in different apps.  So each class supports all three, under
   method names .getlex(), .getgray() and .getrand().

3) I've never before released code that relies on my doctest.py framework
   for testing, but it saves me so much time I'm ending that policy.  In
   return, you can be certain that every example in every docstring behaves
   exactly as shown.  But you won't be able to run the tests yourself
   without downloading the doctest framework (see Python FTP Contrib).

4) Let the user supply a different random number generator.

I expect all 4 points will apply to other combinatorial families too, so
before I push on I'd like interested people to try this interface and gripe
about what they hate.

or-gripe-about-what-they-love-sometimes-it's-hard-to-tell-ly y'rs
    - tim

# Module combgen version 0.0.1
# Released to the public domain 09-Dec-1999,
# by Tim Peters (tim_one at email.msn.com).

# Provided as-is; use at your own risk; no warranty; no promises; enjoy!

"""\
CombGen(s, k) supplies methods for generating k-combinations from s.

CombGenBasic(n, k) acts like CombGen(range(n), k) but is more
efficient.

s is of any sequence type such that s supports catenation (s1 + s2)
and slicing (s[i:j]).  For example, s can be a list, tuple or string.

k is an integer such that 0 <= k <= len(s).

A k-combination of s is a subsequence C of s where len(C) = k, and for
some k integers i_0, i_1, ..., i_km1 (km1 = k-1) with
0 <= i_0 < i_1 < ... < i_km1 < len(s),

    C[0] is s[i_0]
    C[1] is s[i_1]
    ...
    C[k-1] is s[i_km1]

Note that each k-combination is a sequence of the same type as s.

Different methods generate k-combinations in lexicographic index
order, a particular "Gray code" order, or at random.

The constructor saves a reference to (not a copy of) s, so don't
mutate s after calling CombGen.  Also a bad idea to mix calls to
getlex() with getgray() (they share internal state, but a call to one
may not update the state used by the other).

Module function comb(n, k) returns the number of combinations of n
things taken k at a time; n >= k >= 0 required.

The .reset() method can be used to start over.

The .set_start(ivector) method can be used to force generation to
begin at a particular combination.


GETLEX -- LEXICOGRAPHIC GENERATION

Each invocation of .getlex() returns a new k-combination of s.  The
combinations are generated in lexicographic index order (for
CombGenBasic, the k-combinations themselves are in lexicographic
order).  That is, the first k-combination consists of

    s[0], s[1], ..., s[k-1]

in that order; the next of

    s[0], s[1], ..., s[k]

and so on until reaching

    s[len(s)-k], s[len(s)-k+1], ..., s[len(s)-1]

After all k-combinations have been generated, .getlex() returns None.

Examples:

>>> g = CombGen("abc", 0).getlex
>>> g(), g()
('', None)

>>> g = CombGen("abc", 1).getlex
>>> g(), g(), g(), g()
('a', 'b', 'c', None)

>>> g = CombGenBasic(3, 2).getlex
>>> g(), g(), g(), g()
([0, 1], [0, 2], [1, 2], None)

>>> g = CombGen((0, 1, 2), 3).getlex
>>> g(), g(), g()
((0, 1, 2), None, None)

>>> p = CombGenBasic(4, 2)
>>> g = p.getlex
>>> g(), g(), g(), g(), g(), g(), g(), g()
([0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3], None, None)
>>> p.reset()
>>> g(), g(), g(), g(), g(), g(), g(), g()
([0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3], None, None)
>>>


GETGRAY -- GRAY CODE GENERATION

Each invocation of .getgray() returns a triple

    C, tossed, added

where
    C is the next k-combination of s
    tossed is the element of s removed from the last k-combination
    added is the element of s added to the last k-combination

tossed and added are None for the first call.

Consecutive combinations returned by .getgray() differ by two elements
(one removed, one added).  If you invoke getgray() more than comb(n,k)
times, it "wraps around" and generates the same sequence again.  Note
that the last combination in the return sequence also differs by two
elements from the first combination in the return sequence.

Gray code ordering can be very useful when you're computing an
expensive function on each combination:  that exactly one element is
added and exactly one removed can often be exploited to save
recomputation for the k-2 common elements.

>>> o = CombGen("abcd", 2)
>>> for i in range(7):  # note that this wraps around
...     print o.getgray()
('ab', None, None)
('bd', 'a', 'd')
('bc', 'd', 'c')
('cd', 'b', 'd')
('ad', 'c', 'a')
('ac', 'd', 'c')
('ab', 'c', 'b')
>>>


GETRAND -- RANDOM GENERATION

Each invocation of .getrand() returns a random k-combination.

>>> o = CombGenBasic(1000, 6)
>>> import random
>>> random.seed(87654)
>>> o.getrand()
[69, 223, 437, 573, 722, 778]
>>> o.getrand()
[409, 542, 666, 703, 732, 847]
>>> CombGenBasic(1000000, 4).getrand()
[199449, 439831, 606885, 874530]
>>>
"""






More information about the Python-list mailing list