[issue8692] Use divide-and-conquer for faster factorials

Thu May 13 23:50:48 CEST 2010

Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

On Thu, May 13, 2010 at 5:31 PM, Mark Dickinson <report at bugs.python.org> wrote:
> And why are we trying to speed up the pure Python factorial code here?

I would expect that for large factorials the performance will be
determined by the number of long multiplications and the size of
multiplicands.

>Â I don't imagine that those speed differences are going to translate well to C.

The differences between recursive and non-recursive versions are not
likely to translate well, but the difference (if any) between the
order of multiplication most likely will.

In any case, I am attaching fixed version of factorial4.

$ ./python.exe -m timeit -s "from factorial4 import f0 as f" "f(10000)"
10 loops, best of 3: 65.5 msec per loop
$ ./python.exe -m timeit -s "from factorial4 import f1 as f" "f(10000)"
10 loops, best of 3: 66.9 msec per loop
$ ./python.exe -m timeit -s "from factorial4 import f2 as f" "f(10000)"
10 loops, best of 3: 56.5 msec per loop
$ ./python.exe -m timeit -s "from factorial4 import f3 as f" "f(10000)"
10 loops, best of 3: 63 msec per loop

----------
Added file: http://bugs.python.org/file17320/factorial4.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8692>
_______________________________________
-------------- next part --------------
import functools
import operator

product = functools.partial(functools.reduce, operator.mul)

def partial_product(j, i):
    if i == j:
        return j << 1 | 1
    if i == j + 1:
        return (j << 1 | 1) * (i << 1 | 1)
    l = i + j >> 1
    return partial_product(j, l) * partial_product(l + 1, i)

def naive_factorial(n):
    """Naive implementation of factorial: product([1, ..., n])

    >>> naive_factorial(4)
    24
    """
    return product(range(1, n+1), 1)

def factorial(n, pp=partial_product):
    """Implementation of Binary-Split Factorial algorithm

    See http://www.luschny.de/math/factorial/binarysplitfact.html

    >>> for n in range(20):
    ...     assert(factorial(n) == naive_factorial(n))
    >>> import math
    >>> assert(factorial(100) == math.factorial(100))
    """
    _, r = loop(n, pp)
    return r << (n - count_bits(n))

def loop(n, pp):
    p = r = 1
    for i in range(n.bit_length() - 2, -1, -1):
        m = n >> i
        if m > 2:
            p *= pp(((m >> 1) + 1) >> 1, (m - 1) >> 1)
            r *= p
    return p, r

def partial_product1(j, i):
    return product((l << 1 | 1 for l in range(j, i + 1)), 1)

def partial_product2(j, i):
    a = [l << 1 | 1 for l in range(j, i + 1)]
    n = i - j + 1
    p = 1
    while n > 1:
        for k in range(n>>1):
            a[k] *= a[n-k-1]
        n = (n>>1) + (n&1)
    return a[0]

def partial_product3(j, i):
    a = [l << 1 | 1 for l in range(j, i + 1)]
    while 1:
        n = len(a)
        if n == 1:
            return a[0]
        a = [a[k<<1] * a[k<<1|1] for k in range(n>>1)] + a[(n >> 1)<<1:]

def partial_product4(j, i):
    if i == j:
        return j << 1 | 1
    if i == j + 1:
        return (j << 1 | 1) * (i << 1 | 1)
    return (j << 1 | 1) * (i << 1 | 1) * partial_product4(j + 1, i - 1)

def count_bits(n):
    count = 0
    while n:
        n &= n - 1
        count += 1
    return count

def f0(n):
    return factorial(n)

def f1(n):
    return factorial(n, partial_product1)

def f2(n):
    return factorial(n, partial_product2)

def f3(n):
    return factorial(n, partial_product3)

if __name__ == '__main__':
    for n in list(range(12, 20)) + [100, 101, 200]:
        assert(f0(n) == f1(n) == f2(n) == f3(n))
    import doctest
    doctest.testmod()