[Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]

Wed, 28 Aug 2002 15:36:57 -0400

This is a multi-part message in MIME format.

--Boundary_(ID_K+WtxWU4+owRK5lyzEc5lw)
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: 7BIT

[Guido]
> I'm not saying that it looks good overall -- I'd like to defer to Tim,
> who has used and written this kind of utities for real, and who
> probably has a lot of useful feedback.  Right now, he's felled by some
> kind of ilness though.

I think that's over!  I'm very tired, though (couldn't get to sleep until
11, and woke up at 2 with the last, umm, episode <wink>).

This is a Big Project if done right.  I volunteered time for it a few years
ago, but there wasn't enough interest then to keep it going.  I'll attach
the last publicly-distribued module I had then, solely devoted to
combinations.  It was meant to be the first in a series, all following some
basic design decisions:

+ A Basic class that doesn't compromise on speed, typically by working
  on canonical representatives in Python list-of-int form.

+ A more general class that deals with arbitrary sequences, perhaps
  at great loss of efficiency.

+ Multiple iterators are important:  lex order is needed sometimes;
  Gray code order is an enormous help sometimes; random generation is
  vital sometimes.

+ State-of-the-art algorithms.  That's a burden for anything that
  goes into the core -- if it's a toy algorithm, users can
  do just as well on their own, and then people submit patch after
  patch that the original author isn't really qualified to judge
  (else they would have done a state-of-the-art thing to begin with).

+ The ability to override the random number generator.  Python's
  default WH generator is showing its age as machines get faster;
  it's simply not adequate anymore for long-running programs making
  heavy use of it on a fast box.  Combinatorial algorithms in
  particular do tend to make heavy use of it.  (Speaking of which,
  "someone" should look into grabbing one of the Mersenne Twister
  extensions for Python -- that's the current state of *that* art).

Ideas not worth taking:

+ Leave the chi-square algorithm out of it.  A better implementation
  would be nice to have in a statistics package, but it doesn't
  belong here regardless.

me-i'm-going-back-to-sleep-ly y'rs  - tim

--Boundary_(ID_K+WtxWU4+owRK5lyzEc5lw)
Content-type: text/plain; name=combgen.py
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=combgen.py

# Module combgen version 0.9.1
# Released to the public domain 18-Dec-1999,
# by Tim Peters (tim_one@email.msn.com).

# Provided as-is; use at your own risk; no warranty; no promises; enjoy!

"""\
CombGen(s, k) supplies methods for generating k-combinations from s.

CombGenBasic(n, k) acts like CombGen(range(n), k) but is more
efficient.

s is of any sequence type such that s supports catenation (s1 + s2)
and slicing (s[i:j]).  For example, s can be a list, tuple or string.

k is an integer such that 0 <= k <= len(s).

A k-combination of s is a subsequence C of s where len(C) = k, and for
some k integers i_0, i_1, ..., i_km1 (km1 = k-1) with
0 <= i_0 < i_1 < ... < i_km1 < len(s),

    C[0] is s[i_0]
    C[1] is s[i_1]
    ...
    C[k-1] is s[i_km1]

Note that each k-combination is a sequence of the same type as s.

Different methods generate k-combinations in lexicographic index
order, a particular "Gray code" order, or at random.

The .reset() method can be used to start over.

The .set_start(ivector) method can be used to force generation to
begin at a particular combination.

Module function comb(n, k) returns the number of combinations of n
things taken k at a time; n >= k >= 0 required.

CAUTIONS

+ The CombGen constructor saves a reference to (not a copy of) s, so
  don't mutate s after calling CombGen.

+ For efficiency, CombGenBasic getlex and getgray return the *same*
  list each time, mutating it in place.  You must not mutate this
  list; and, if you want to save a combination's value across calls,
  copy the list.  For example,

>>> g = CombGenBasic(2, 1)
>>> x = g.getlex(); y = g.getlex()
>>> x is y  # the same!
1
>>> x, y    # so these print the same thing
([1], [1])
>>> g.reset()
>>> x = g.getlex()[:]; y = g.getlex()[:]
>>> x, y    # copies work as expected
([0], [1])
>>>

In contrast, CombGen methods return a new sequence each time -- but
they're slower.

GETLEX -- LEXICOGRAPHIC GENERATION

Each invocation of .getlex() returns a new k-combination of s.  The
combinations are generated in lexicographic index order (for
CombGenBasic, the k-combinations themselves are in lexicographic
order).  That is, the first k-combination consists of

    s[0], s[1], ..., s[k-1]

in that order; the next of

    s[0], s[1], ..., s[k]

and so on until reaching

    s[len(s)-k], s[len(s)-k+1], ..., s[len(s)-1]

After all k-combinations have been generated, .getlex() returns None.

Examples:

>>> g = CombGen("abc", 0).getlex
>>> g(), g()
('', None)

>>> g = CombGen("abc", 1).getlex
>>> g(), g(), g(), g()
('a', 'b', 'c', None)

>>> g = CombGenBasic(3, 2).getlex
>>> print g(), g(), g(), g()
[0, 1] [0, 2] [1, 2] None

>>> g = CombGen((0, 1, 2), 3).getlex
>>> print g(), g(), g()
(0, 1, 2) None None

>>> p = CombGenBasic(4, 2)
>>> g = p.getlex
>>> print g(), g(), g(), g(), g(), g(), g(), g()
[0, 1] [0, 2] [0, 3] [1, 2] [1, 3] [2, 3] None None
>>> p.reset()
>>> print g(), g(), g(), g(), g(), g(), g(), g()
[0, 1] [0, 2] [0, 3] [1, 2] [1, 3] [2, 3] None None
>>>

GETGRAY -- GRAY CODE GENERATION

Each invocation of .getgray() returns a triple

    C, tossed, added

where
    C is the next k-combination of s
    tossed is the element of s removed from the last k-combination
    added is the element of s added to the last k-combination

tossed and added are None for the first call.

Consecutive combinations returned by .getgray() differ by two elements
(one removed, one added).  If you invoke getgray() more than comb(n,k)
times, it "wraps around" and generates the same sequence again.  Note
that the last combination in the return sequence also differs by two
elements from the first combination in the return sequence.

Gray code ordering can be very useful when you're computing an
expensive function on each combination:  that exactly one element is
added and exactly one removed can often be exploited to save
recomputation for the k-2 common elements.

>>> o = CombGen("abcd", 2)
>>> for i in range(7):  # note that this wraps around
...     print o.getgray()
('ab', None, None)
('bd', 'a', 'd')
('bc', 'd', 'c')
('cd', 'b', 'd')
('ad', 'c', 'a')
('ac', 'd', 'c')
('ab', 'c', 'b')
>>>

GETRAND -- RANDOM GENERATION

Each invocation of .getrand() returns a random k-combination.

>>> o = CombGenBasic(1000, 6)
>>> import random
>>> random.seed(87654)
>>> o.getrand()
[69, 223, 437, 573, 722, 778]
>>> o.getrand()
[409, 542, 666, 703, 732, 847]
>>> CombGenBasic(1000000, 4).getrand()
[199449, 439831, 606885, 874530]
>>>
"""

# 0,0,1    09-Dec-1999
#    initial version
# 0,0,2    10-Dec-1999
#    Sped CombGenBasic.{getlex, getgray} substantially by no longer
#    making copies of the indices; getgray is truly O(1) now.
#    A bad aspect is that they return the same list object each time
#    now, which can be confusing; e.g., had to change some examples.
#    Use CombGen instead if this bothers you -- CombGenBasic's
#    purpose in life is to be lean & mean.
#    Removed the restriction on mixing calls to CombGenBasic's
#    getlex and getgray; not sure it's useful, but it was irksome.
#    Changed __findj to return a simpler result.  This is less useful
#    for getgray, but now getlex can exploit it too (there are no
#    longer any Python-level loops in CombGenBasic's getlex; there's
#    an implied C-level loop (via "range"), and it's in the nature of
#    lex order that this can't be removed).
#    Added some exhaustive tests for getlex, and finger verification.
# 0,9,1    18-Dec-1999
#    Changed _testrand to compute and print chi-square statistics,
#    and probabilities, because one of _testrand's outputs didn't
#    "look random" to me.  Indeed, it's got a poor chi-square value!
#    But sometimes that *should* happen, and it does not appear to
#    be happening more often than expected.

__version__ = 0, 9, 1

def _chop(n):
    """n -> int if it fits, else long."""

    try:
        return int(n)
    except OverflowError:
        return n

def comb(n, k):
    """n, k -> number of combinations of n items, k at a time.

    n >= k >= 0 required.

    >>> for i in range(7):
    ...     print "comb(6, %d) ==" % i, comb(6, i)
    comb(6, 0) == 1
    comb(6, 1) == 6
    comb(6, 2) == 15
    comb(6, 3) == 20
    comb(6, 4) == 15
    comb(6, 5) == 6
    comb(6, 6) == 1
    >>> comb(52, 5)   # number of poker hands
    2598960
    >>> comb(52, 13)  # number of bridge hands
    635013559600L
    """

    if not n >= k >= 0:
        raise ValueError("n >= k >= 0 required: " + `n, k`)
    if k > (n >> 1):
        k = n-k
    if k == 0:
        return 1
    result = long(n)
    i = 2
    n, k = n-1, k-1
    while k:
        # assert (result * n) % i == 0
        result = result * n / i
        i = i+1
        k = k-1
        n = n-1
    return _chop(result)

import random

class CombGenBasic:

    def __init__(self, n, k):
        self.n, self.k = n, k
        if not n >= k >= 0:
            raise ValueError("n >= k >= 0 required:" + `n, k`)
        self.reset()

    def reset(self):
        """Restore state to that immediately after construction."""
        # The first result is the same for either lexicographic or
        # Gray code generation.
        self.set_start(range(self.k))

    # __findj is used only to initialize self.j for getlex and
    # getgray.  It returns the largest j such that slot j has
    # "breathing room"; that is, such that slot j isn't at its largest
    # possible value (n-k+j).  j is -1 if no such index exists.
    # After initialization, getlex and getgray incrementally update
    # this more efficiently.

    def __findj(self, v):
        n, k = self.n, self.k
        assert len(v) == k
        j = k-1
        while j >= 0 and v[j] == n-k+j:
            # v[j] is at its largest possible value
            j = j-1
        return j

    def getlex(self):
        """Return next (in lexicographic order) k-combination.

        Return None if all possibilities have been generated.

        Caution:  getlex returns the *same* list each time, mutated
        in place.  Don't mutate it yourself, or save a reference to it
        (the next call will mutate its contents; make a copy if you
        need to save the value across calls).
        """

        indices, n, k, j = self.indices, self.n, self.k, self.j
        if self.firstcall:
            self.firstcall = 0
            return indices
        if j < 0:
            return None
        new = indices[j] = indices[j] + 1
        if j+1 == k:
            if new + 1 == n:
                j = j-1
        else:
            if new + 1 < indices[j+1]:
                indices[j:] = range(new, new + k - j)
                j = k-1
            else:
                j = j-1

        self.j = j
        # assert j == self.__findj(indices)
        return indices

    def getgray(self):
        """Return next (c, tossed, added) triple.

        c is the next k-combination in a particular Gray code order.
        tossed is the element of range(n) removed from the last
        combination.
        added is the element of range(n) added to the last
        combination.

        tossed and added are None if this is the first call, or on
        every call if there is only one k-combination.  Else tossed !=
        added, and neither is None.

        Caution:  getgray wraps around if you invoke it more than
        comb(n, k) times.

        Caution:  getgray returns the *same* list each time, mutated
        in place.  Don't mutate it yourself, or save a reference to it
        (the next call will mutate its contents; make a copy if you
        need to save the value across calls).
        """

        # The popular routine in Nijenhuis & Wilf's "Combinatorial
        # Algorithms" is exceedingly complicated (although trivial
        # to program with recursive generators!).
        #
        # Instead I'm using a variation of Algorithm A3 from the paper
        # "Loopless Gray Code Algorithms", by T.A. Jenkyns (Brock
        # University, Ontario).  The code is much simpler, and,
        # because it's loop-free, takes O(1) time on each call (not
        # just amortized over the whole sequence).
        #
        # Because the paper doesn't yet seem to be well known, here's
        # the idea:  Modify the definition of lexicographic ordering
        # in a funky way:  in the element comparisons, replace "<" by
        # ">" in every other element position starting at the 2nd.
        # IOW, and skipping end cases, sequence s is "less than"
        # sequence t iff their elements are equal up until index
        # position i, and then s[i] < t[i] if i is even, or s[i] >
        # t[i] if i is odd.  Jenkyns calls this "alternating
        # lexicographic" order.  It's clear that this defines a total
        # ordering.  What isn't obvious is that it's also a Gray code
        # ordering!  Very pretty.
        #
        # Modifications made here to A3 are minor, and include
        # switching from 1-based to 0-based; allowing for trivial
        # sequences; allowing for wrap-around; returning the "tossed"
        # and "added" elements; starting the generation at an
        # arbitrary k-combination; and sharing a finger (self.j) with
        # the getlex method.

        indices, n, k, j = self.indices, self.n, self.k, self.j
        if self.firstcall:
            self.firstcall = 0
            return indices, None, None

        # Slide over to first slot that *may* be able to move down.
        # Note that this leaves odd j alone (including -1!), and may
        # make j equal to k.
        j = j | 1

        if j == k:
            # k is odd and so indices[-1] "wants to move up", and
            # indices[-1] < n-1 so it *can* move up.
            tossed = indices[-1]
            added = indices[-1] = tossed + 1
            j = j-1
            if added == n-1:
                j = j-1
        elif j < 0:
            # indices has the last value in alt-lex order, e.g.
            # [4, 5, 6, 7]; wrap around to the first value, e.g.
            # [0, 5, 6, 7].
            assert indices == range(n-k, n)
            if k and indices[0]:
                tossed = indices[0]
                added = indices[0] = 0
                j = 0
            else:
                # comb(n, k) is 1 -- this is a trivial sequence.
                tossed = added = None
        else:
            # 0 < j < k (note that 0 < j because j is odd).
            # Want to move this slot down (again because j is odd).
            atj = indices[j]
            if indices[j-1] + 1 == atj:
                # can't move it down; move preceding up
                tossed = atj - 1    # the value in indices[j-1]
                indices[j-1] = atj
                added = indices[j] = n-k+j
                j = j-1
                if atj + 1 == added:
                    j = j-1
            else:
                # can move it down
                tossed = atj
                added = indices[j] = atj - 1
                if j+1 < k:
                    tossed = indices[j+1]
                    indices[j+1] = atj
                    j = j+1

        self.j = j
        # assert j == self.__findj(indices)
        return indices, tossed, added

    def set_start(self, start):
        """Force .getlex() or .getgray() to start at given value.

        start is a vector of k unique integers in range(n), where
        k and n were passed to the CombGenBasic constructor.

        The vector is sorted in increasing order, and is used as the
        the next k-combination to be returned by .getlex() or
        .getgray().

        >>> gen = CombGenBasic(3, 2)
        >>> for i in range(4):
        ...     print gen.getgray()
        ([0, 1], None, None)
        ([1, 2], 0, 2)
        ([0, 2], 1, 0)
        ([0, 1], 2, 1)

        >>> gen.set_start([0, 2])
        >>> for i in range(4):
        ...     print gen.getgray()
        ([0, 2], None, None)
        ([0, 1], 2, 1)
        ([1, 2], 0, 2)
        ([0, 2], 1, 0)
        """

        if len(start) != self.k:
            raise ValueError("start vector not of length " + `k`)
        indices = start[:]
        indices.sort()
        seen = {}
        # Verify the vector makes sense.
        for i in indices:
            if not 0 <= i < self.n:
                raise ValueError("start vector contains element "
                                 "not in 0.." + `self.n-1` +
                                 ": " + `i`)
            if seen.has_key(i):
                raise ValueError("start vector contains duplicate "
                                 "element: " + `i`)
            seen[i] = 1
        self.indices = indices
        self.j = self.__findj(indices)
        self.firstcall = 1

    def getrand(self, random=random.random):
        """Return a k-combination at random.

        Optional arg random specifies a no-argument function that
        returns a random float in [0., 1.).  By default, random.random
        is used.
        """

        # The trap to avoid is doing O(n) work when k is much less
        # than n.  Letting m = min(k, n-k), we actually do Python work
        # of O(m), and C-level work of O(m log m) for a sort.  In
        # addition, O(k) work is required to build the final result,
        # but at worst O(m) of that work is done at Python speed.

        n, k = self.n, self.k
        complement = 0
        if k > n/2:
            # Generate the values *not* in the combination.
            complement = 1
            k = n-k

        # Generate k distinct random values.
        result = {}
        for i in xrange(k):
            # The expected # of times thru the next loop is n/(n-i).
            # Since i < k <= n/2, n-i > n/2, so n/(n-i) < 2 and is
            # usually closer to 1:  on average, this succeeds very
            # quickly!
            while 1:
                candidate = int(random() * n)
                if not result.has_key(candidate):
                    result[candidate] = 1
                    break
        result = result.keys()
        result.sort()
        if complement:
            # We want everything in range(n) that's *not* in result.
            avoid = result
            avoid.append(n)
            result = []
            start = 0
            for limit in avoid:
                result.extend(range(start, limit))
                start = limit + 1
        return result

class CombGen:

    def __init__(self, seq, k):
        n = len(seq)
        if not 0 <= k <= n:
            raise ValueError("k must be in 0.." + `n` + ": " + `k`)
        self.seq = seq
        self.base = CombGenBasic(n, k)

    def reset(self):
        """Restore state to that immediately after construction."""
        self.base.reset()

    def getlex(self):
        """Return next (in lexicographic index order) k-combination.

        Return None if all possibilities have been generated.
        """

        indices = self.base.getlex()
        if indices is None:
            return None
        else:
            return self.__indices2seq(indices)

    def getgray(self):
        """Return next (c, tossed, added) triple.

        c is the next k-combination in a particular Gray code order.
        tossed is the element of s removed from the last combination.
        added is the element of s added to the last combination.

        Caution:  getgray wraps around if you invoke it more than
        comb(len(s), k) times.
        """

        indices, tossed, added = self.base.getgray()
        if tossed is None:
            return (self.__indices2seq(indices), None, None)
        else:
            return (self.__indices2seq(indices),
                    self.seq[tossed],
                    self.seq[added])

    def set_start(self, start):
        """Force .getlex() or .getgray() to start at given value.

        start is a vector of k unique integers in range(len(s)), where
        k and s were passed to the CombGen constructor.

        The vector is sorted in increasing order, and is used as a
        vector of indices (into s) for the next k-combination to be
        returned by .getlex() or .getgray().

        >>> gen = CombGen("abc", 2)
        >>> for i in range(4):
        ...     print gen.getgray()
        ('ab', None, None)
        ('bc', 'a', 'c')
        ('ac', 'b', 'a')
        ('ab', 'c', 'b')

        >>> gen.set_start([0, 2]) # start with "ac"
        >>> for i in range(4):
        ...     print gen.getgray()
        ('ac', None, None)
        ('ab', 'c', 'b')
        ('bc', 'a', 'c')
        ('ac', 'b', 'a')

        >>> gen.set_start([0, 2]) # ditto
        >>> print gen.getlex(), gen.getlex(), gen.getlex()
        ac bc None
        """

        self.base.set_start(start)

    def getrand(self, random=random.random):
        """Return a k-combination at random.

        Optional arg random specifies a no-argument function that
        returns a random float in [0., 1.).  By default, random.random
        is used.
        """

        return self.__indices2seq(self.base.getrand(random))

    def __indices2seq(self, ivec):
        assert len(ivec) == self.base.k, "else internal error"
        seq = self.seq
        result = seq[0:0]   # an empty sequence of the proper type
        for i in ivec:
            result = result + seq[i:i+1]
        return result

del random

#####################################################################
# Testing.
#####################################################################

def _verifycomb(n, k, comb, inbase, baseobj=None):
    if len(comb) != k:
        print "OUCH!", this, "should have length", k

    # verify it's an increasing sequence of baseseq elements
    lastelt = None
    for elt in comb:
        if not inbase(elt):
            print "OUCH!", elt, "not in base seqeuence", n, k, comb
        if not lastelt < elt:
            print "OUCH!", elt, ">=", lastelt, n, k, comb
        lastelt = elt

    if baseobj:
        # verify search finger is correct
        cachedj = baseobj.j
        truej = baseobj._CombGenBasic__findj(baseobj.indices)
        if cachedj != truej:
            print "OUCH! cached j", cachedj, "!= true j", truej, \
                  n, k, comb

def _testnk_gray(n, k):
    start = "abcdefghijklmnopqrstuvwxyz"[:n]
    def inbase(elt, start=start):
        return elt in start
    o = CombGen(start, k)
    c = comb(n, k)
    seen = {}
    last, lastlist = None, None
    for i in xrange(c+1):
        this, tossed, added = o.getgray()
        _verifycomb(n, k, this, inbase, o.base)

        if seen.has_key(this) and i < c:
            print "OUCH!", this, "seen before at", seen[this], n, k
        seen[this] = i

        thislist = list(this)
        if (tossed is None) != (added is None):
            print "OUCH! tossed and added None clash", tossed, \
                  added, n, k, last, this
        if last is None:
            last, lastlist = this, thislist
            continue
        if tossed is not None:
            if tossed == added:
                print "OUCH! tossed == added", tossed, added, \
                      n, k, last, this
            lastlist.remove(tossed)
            lastlist.append(added)
            lastlist.sort()
        elif c != 1:
            print "OUCH! tossed None but comb(n, k) not 1", \
                  c, tossed, added, n, k, last, this
        if lastlist != thislist:
            print "OUCH! does not compute", n, k, tossed, added, \
                  last, this
        last, lastlist = this, thislist
    if last != start[:k]:
        print "OUCH! didn't wrap around", n, k, last, this

# getgray is especially delicate, so hammer on it.

def _testgray():
    """
    >>> _testgray()
    testing getgray 0
    testing getgray 1
    testing getgray 2
    testing getgray 3
    testing getgray 4
    testing getgray 5
    testing getgray 6
    testing getgray 7
    testing getgray 8
    testing getgray 9
    testing getgray 10
    testing getgray 11
    testing getgray 12
    """
    for n in range(13):
        print "testing getgray", n
        for k in range(n+1):
            _testnk_gray(n, k)

# getlex is easier.

def _testnk_lex(n, k):
    start = "abcdefghijklmnopqrstuvwxyz"[:n]
    def inbase(elt, start=start):
        return elt in start
    o = CombGen(start, k)
    c = comb(n, k)
    last = None
    for i in xrange(c):
        this = o.getlex()
        _verifycomb(n, k, this, inbase, o.base)
        if not last < this:
            print "OUCH! not lexicographic", last, this, n, k
        last = this
    this = o.getlex()
    if this is not None:
        print "OUCH! should have returned None", n, k, this

def _testlex():
    """
    >>> _testlex()
    testing getlex 0
    testing getlex 1
    testing getlex 2
    testing getlex 3
    testing getlex 4
    testing getlex 5
    testing getlex 6
    testing getlex 7
    testing getlex 8
    """
    for n in range(9):
        print "testing getlex", n
        for k in range(n+1):
            _testnk_lex(n, k)

import math
_math = math
del math

# This is a half-assed implementation, prone to overflow and/or
# underflow given "large" x or v.  If they're both <= a few hundred,
# though, it's quite accurate.  The main advantage is that it's
# self-contained.

def _chi_square_distrib(x, v):
    """x, v -> return probability that chi-square statistic <= x.

    v is the number of degrees of freedom, an integer >= 1.

    x is a non-negative float or int.
    """

    if x < 0:
        raise ValueError("x must be >= 0: " + `x`)
    if v < 1:
        raise ValueError("v must be >= 1: " + `v`)
    if v != int(v):
        raise TypeError("v must be an integer: " + `v`)
    if x == 0:
        return 0.0

    # (x/2)**(v/2) / gamma((v+2)/2) * exp(-x/2) *
    # (1 + sum(i=1 to inf, x**i/prod(j=1 to i, v+2*j)))

    # Alas, for even moderately large x or v, this is numerically
    # intractable.  But the mean of the distribution is v, so in
    # practice v will likely be "close to" x.  Rewrite the first
    # line as
    # (x/2/e)**(v/2) / gamma((v+2)/2) * exp(v/2-x/2)
    # Now exp is much less likely to over or underflow.  The power is
    # still a problem, though, so we compute
    #    (x/2/e)**(v/2) / gamma((v+2)/2)
    # via repeated multiplication.
    x = float(x)
    a = x / 2 / _math.exp(1)
    v = float(v)
    v2 = v/2
    if int(v2) * 2 == v:
        # v is even
        base = 1.0
        i = 1.0
    else:
        # v is odd, so the gamma bottoms out at gamma(.5) = sqrt(pi),
        # and we need to get a sqrt(a) factor into the numerator
        # (since v2 "ends with" .5).
        base = 1.0 / _math.sqrt(a *  _math.pi)
        i = 0.5
    while i <= v2:
        base = base * (a / i)
        i = i + 1.0
    base = base * _math.exp(v2 - x/2)
    # Now do the infinite sum.
    oldsum = None
    sum = base
    while oldsum != sum:
        oldsum = sum
        v = v + 2.0
        base = base * (x / v)
        sum = sum + base
    return sum

def _chisq(observed, expected):
    n = len(observed)
    assert n == len(expected)
    sum = 0.0
    for i in range(n):
        e = float(expected[i])
        sum = sum + (observed[i] - e)**2 / e
    return sum, _chi_square_distrib(sum, n-1)

def _testrand():
    """
    >>> _testrand()
    random 0 combs of abcde
     100
    random 1 combs of abcde
    a 99
    b 106
    c 98
    d 99
    e 98
    probability[chisq <= 0.46] = 0.0227
    random 2 combs of abcde
    ab 100
    ac 115
    ad 111
    ae 98
    bc 98
    bd 103
    be 95
    cd 84
    ce 100
    de 96
    probability[chisq <= 6.6] = 0.321
    random 3 combs of abcde
    abc 83
    abd 119
    abe 86
    acd 88
    ace 103
    ade 94
    bcd 107
    bce 101
    bde 112
    cde 107
    probability[chisq <= 12.78] = 0.827
    random 4 combs of abcde
    abcd 86
    abce 99
    abde 113
    acde 101
    bcde 101
    probability[chisq <= 3.68] = 0.549
    random 5 combs of abcde
    abcde 100
    """

    def drive(s, k):
        print "random", k, "combs of", s
        o = CombGen(s, k)
        g = o.getrand
        n = len(s)
        def inbase(elt, s=s):
            return elt in s
        count = {}
        c = comb(len(s), k)
        for i in xrange(100 * c):
            x = g()
            _verifycomb(n, k, x, inbase)
            count[x] = count.get(x, 0) + 1
        items = count.items()
        items.sort()
        for x, i in items:
            print x, i
        if c > 1:
            observed = count.values()
            if len(observed) < c:
                observed.extend([0] * (c - len(observed)))
            x, p = _chisq(observed, [100]*c)
            print "probability[chisq <= %g] = %.3g" % (x, p)

    for k in range(6):
        drive("abcde", k)

__test__ = {"_testgray": _testgray,
            "_testlex":  _testlex,
            "_testrand": _testrand}

def _test():
    import doctest, combgen
    doctest.testmod(combgen)

if __name__ == "__main__":
    _test()

--Boundary_(ID_K+WtxWU4+owRK5lyzEc5lw)--