How can I speed up a script that iterates over a large range (600 billion)?

Wed Jun 22 08:01:06 EDT 2011

Chris Torek wrote:

> Now that the exercise has been solved...
> 
> Instead of "really short code to solve the problem", how about
> some "really long code"? :-)
> 
> I was curious about implementing prime factorization as a generator,
> using a prime-number generator to come up with the factors, and
> doing memoization of the generated primes to produce a program that
> does what "factor" does, e.g.:
> 
>     $ factor 99999999999999999
>     99999999999999999: 3 3 2071723 5363222357
> 
> The python code is rather too slow for this particular number (I
> gave up on it finding 5363222357) 

It shouldn't take more than a few seconds to factorise 10**17-1 even in pure
Python. On my not-very-powerful PC, using a home-brew pure-Python function
(code to follow), I get all four factors in under four seconds.

In comparison, I ran your script for over five minutes before giving up, it
still hadn't found the fourth factor.

Don't be disappointed though, you're in illustrious company. Getting the
Sieve of Eratosthenes *badly* wrong is one of the most common mistakes,
second only to getting involved in land wars in Asia. See this paper by
Melissa O'Neill:

http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

What people often describe as the Sieve of Eratosthenes is frequently the
Sieve of Euler, which while mathematically elegant, is computationally
crap. As O'Neill calls it, the Unfaithful Sieve.

In my home-brew primes module, I have this to say about three common
algorithms, including the classic Unfaithful Sieve, Turner's algorithm:

# === Prime number generators to avoid ===

# The following three prime generators are supplied for educational
# purposes, specifically as a terrible warning on how NOT to do it.
#
# Their performance starts at bad in the case of trial_division(), falls to
# terrible with turner(), and ends up with the utterly awful
# naive_division(). None of these three have acceptable performance; they
# are barely tolerable even for the first 100 primes. Their only advantage
# is that they are somewhat easy to understand.

Here's my factors() function. Adapting it to be a generator is left as an
exercise.

def factors(n):
    """factors(integer) -> [list of factors]

    >>> factors(231)
    [3, 7, 11]

    Returns the (mostly) prime factors of integer n. For negative integers,
    -1 is included as a factor. If n is 0 or 1, [n] is returned as the only
    factor. Otherwise all the factors will be prime.
    """
    if n != int(n):
        raise ValueError('argument must be an integer')
    elif n in (0, 1, -1):
        return [n]
    elif n < 0:
        return [-1] + factors(-n)
    assert n >= 2
    result = []
    for p in sieve():
        if p*p > n: break
        while n % p == 0:
            result.append(p)
            n //= p
    if n != 1:
        result.append(n)
    return result

def sieve():
    """Yield prime integers efficiently.

    This uses the Sieve of Eratosthenes, modified to generate the primes
    lazily rather than the traditional version which operates on a fixed
    size array of integers.
    """
    # This implementation based on a version by Melissa E. O'Neill,
    # as reported by Gerald Britton:
    # http://mail.python.org/pipermail/python-list/2009-January/1188529.html
    innersieve = sieve()
    prevsq = 1
    table  = {}
    i = 2
    while True:
        if i in table:
            prime = table[i]
            del table[i]
            nxt = i + prime
            while nxt in table:
                nxt += prime
            table[nxt] = prime
        else:
            yield i
            if i > prevsq:
                j = next(innersieve)
                prevsq = j**2
                table[prevsq] = j
        i += 1

I don't claim that my version of the sieve above will break any
computational records, but it performs quite well for my needs.

-- 
Steven