[issue37295] Possible optimizations for math.comb()

Tue Dec 21 14:32:57 EST 2021

Tim Peters <tim at python.org> added the comment:

Clever, Mark! Very nice.

The justification for the shift count isn't self-evident, and appears to me to be an instance of the generalization of Kummer's theorem to multinomial coefficients. I think it would be clearer at first sight to rely instead on that 2**i/(2**j * 2**k) = 2**(i-j-k), which is shallow.

So here's a minor rewrite doing that instead; it would add 68 bytes to the precomputed static data.

    import math
    # Largest n such that comb(n, k) fits in 64 bits for all k.
    Nmax = 67

    # Express n! % 2**64 as Fodd[n] << Fntz[n] where Fodd[n] is odd.
    Fodd = [] # unsigned 8-byte ints
    Fntz = [] # unsigned 1-byte ints
    for i in range(Nmax + 1):
        f = math.factorial(i)
        lsb = f & -f # isolate least-significant 1 bit
        Fntz.append(lsb.bit_length() - 1)
        Fodd.append((f >> Fntz[-1]) % 2**64)

    Finv = [pow(fodd, -1, 2**64) for fodd in Fodd]

    # All of the above is meant to be precomputed; just static tables in C.

    # Fast comb for small values.
    def comb_small(n, k):
        if not 0 <= k <= n <= Nmax:
            raise ValueError("k or n out of range")
        return ((Fodd[n] * Finv[k] * Finv[n-k] % 2**64)
                << (Fntz[n] - Fntz[k] - Fntz[n-k]))

    # Exhaustive test
    for n in range(Nmax+1):
        for k in range(0, n+1):
            assert comb_small(n, k) == math.comb(n, k)

Since 99.86% of comb() calls in real life are applied to a deck of cards ;-) , it's valuable that Nmax be >= 52.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37295>
_______________________________________