Throw the cat among the pigeons

Wed May 6 00:05:23 EDT 2015

On Sun, 3 May 2015 12:20 am, Cecil Westerhof wrote:

[...]
> But to my surprise tail recursion could even be more efficient. I
> wrote two different versions of factorial with self implemented tail
> recursion. For bigger values both are more efficient. And I expect
> that if the tail recursion is done by the compiler instead of by hand,
> it will be a little faster still.

I decided to do some experimentation.

Firstly, when timing code, one should minimise the amount of "scaffolding"
used around the code you care about. Using a single function like this:

def factorial_iterative(x, flag):
    if flag:
        # first implementation
    else:
        # second implementation

is a bad idea, because you are not just timing the implementations, but also
the scaffolding that selects between them. Also, you increase the size of
the function, which increases the chance of cache misses and other
processor effects. So first step is to split the implementations into
separate functions. Here are my four implementations, the first two are
taken from your code:

def factorial_forloop1(n):
    assert n >= 0
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result

def factorial_forloop2(n):
    # Version with a silly extra variable.
    assert n >= 0
    result = 1
    j=2
    for i in range(2, n + 1):
        result *= j
        j += 1
    return result

from operator import mul
try:
    reduce
except NameError:
    from functools import reduce

def factorial_reduce(n):
    assert n >= 0
    return reduce(mul, range(2, n+1), 1)

def factorial_while(n):
    assert n >= 0
    result = 1
    while n > 1:
        result *= n
        n -= 1
    return result

A quick test to verify that they return the same results:

py> factorial_while(10)
3628800
py> factorial_reduce(10)
3628800
py> factorial_forloop1(10)
3628800
py> factorial_forloop2(10)
3628800

There's no point in optimising code that does the wrong thing!

Now, let's do some timing tests. It is best to use a well-tested timing
framework rather than invent your own dodgy one, so I use the timeit
module. Read the comments in the module to see why rolling your own is a
bad idea.

I'll start with some relative small input sizes. Remember, the inputs may be
small, but the outputs will be very large.

from timeit import Timer
code = "fact(10); fact(20); fact(30)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")

I'm in a bit of a hurry, so I may be a bit more slap-dash than normal.
Normally, I would pick a larger number of trials, and a larger number of
iterations per trial, but here I'm going to use just best of three trials,
each of 10000 iterations each:

for t in (t1, t2, t3, t4):
    print(min(t.repeat(repeat=3, number=10000)))

which prints:

0.22797810286283493  # while
0.17145151272416115  # reduce
0.16230526939034462  # for-loop
0.22819281555712223  # silly for-loop

(Comments added by me.) See my earlier post for why the minimum is the only
statistic worth looking at. These results show that:

- the version using while is significantly slower than the more
straightforward iterative versions using either reduce or a for loop.

- adding an extra, unnecessary, variable to the for-loop, and incrementing
it by hand, carries as much overhead as using a while loop;

- reduce is slightly slower than a pure Python for-loop (at least according
to this simple trial, on my computer -- your results may vary);

- the obvious winner is the straightforward iterative version with a
for-loop.

Now I'm going to test it with a larger input:

py> big = factorial_forloop1(50000)
py> sys.getsizeof(big)  # number of bytes
94460
py> len(str(big))  # number of digits
213237

Recreate the timer objects, and (again, because I'm in something of a hurry)
do a best-of-three with just 2 iterations per trial.

code = "fact(50000)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
for t in (t1, t2, t3, t4):
    print(min(t.repeat(repeat=3, number=2)))

which takes about two minutes on my computer, and prints:

8.604736926034093  # while loop
10.786483339965343  # reduce
10.91099695302546  # for loop
10.821452282369137  # silly version of the for loop

(Again, annotations are by me.)

These results are fascinating, and rather surprising. I think that they
demonstrate that, at this size of input argument, the time is dominated by
processing the BigNum int objects, not the iteration: whether we use
reduce, a straight-forward for-loop, or a for-loop with an extra variable
makes very little difference, of the order of 1%.

(I wouldn't read too much into the fact that the for-loop which does *more*
work, but manually adjusting a second variable, is slightly faster. Unless
that can be proven to be consistently the case, I expect that is just
random noise. I ran some quick trials, and did not replicate that result:

py> for t in (t3, t4, t3, t4):
...     print(min(t.repeat(repeat=3, number=1)))
...
5.282072028145194  # for-loop
5.415546240285039  # silly for-loop
5.358346642926335  # for-loop
5.328046130016446  # silly for-loop

Note that, as expected, doing two iterations per trial takes about twice as
long as one iteration per trial: 5 seconds versus 10.

What is surprising is that for very large input like this, the while loop is
significantly faster than reduce or either of the for-loops. I cannot
explain that result.

(Just goes to show that timing code in Python can surprise you when you
least expect it.)

Oh, and for the record, I'm using Python 3.3 on Linux:

py> sys.version
'3.3.0rc3 (default, Sep 27 2012, 18:44:58) \n[GCC 4.1.2 20080704 (Red Hat
4.1.2-52)]'

Results may vary on other platforms and versions.

-- 
Steven