Iteration over recursion?

Wed Jun 21 07:37:41 EDT 2006

[Kay Schluehr]
> You might use a separate prime generator to produce prime factors. The
> factorize algorithm becomes quite simple and configurable by prime
> generators.

Alas, yours was _so_ simple that it always takes time proportional to
the largest prime factor of n (which may be n) instead of worst-case
time proportional to sqrt(n).  Repair that, and it's not quite so
simple anymore.  The speed of the sieve generator would also benefit a
lot by never looking at even integers, beyond an initial "yield 2" ...
the OP said he was looking for speed more than elegance, and they're
not good buddies here ;-)

> def eratosthenes():
>     memo = {}
>     q = 2
>     while True:
>         p = memo.pop(q, None)
>         if p is None:
>             yield q
>             memo[q*q] = q
>         else:
>             x = p + q
>             while x in memo:
>                 x += p
>             memo[x] = p
>         q+=1
>
> def factorize(n, sieve = eratosthenes):
>     if n <= 1:
>         return [n]
>     factors = []
>     primes = sieve()
>     for q in primes:
>         while n % q == 0:
>             factors.append(q)
>             n //= q
>             if n == 1:
>                 return factors

At _some_ point you might think that's bound to be faster than only
skipping multiples of 2 and 3, because it does fewer divisions.  But
to get to a particular prime p, the sieve there has to go around its
outer loop about 3x as often as the trial() function goes around its
inner loop, and grows a dict with about p/ln(p) entries (while the
trial() function's memory use is constant), and those aren't free.

Applied to 9999991**2 (constructed so that both functions take time
proportional to sqrt(n)), on my box the factorize() function took 4x
longer than trial() (approximately 16 seconds versus 4).  Trial
division isn't practical for such large inputs, but I picked the
square of a "large" prime to give factorize() maximum advantage in
skipping division attempts (by the time we get to a prime p,
factorize() tries about p/ln(p) divisions, while trial() does about
p/3; so trial() does about ln(p)/3 times as many divisions as
factorize():  the larger the p we need, the larger that _factor_
gets).

Alas, it appears that made the dict so big that cache faults on dict
accesses more than wiped out the division advantage.  At the smaller
999983**2, factorize() took only about 3.6x longer, despite losing
some of its relative division advantage.

In short, it's never what you think it is ;-)