Prime number algo... what's wrong?

Sat Mar 22 10:08:42 EST 2003

L. B. wrote:

> Hi all,
> 
> it's two days i'm trying to make this stuff work!! ;-) I really may be
> dumb! Now i think i'm on the right way and close to the result but
> still the algo doesn't work... could you please tell me what i'm
> missing?

You're missing Aristotle's "Law of the Excluded Middle" - the heart
of your code is:

>         if test != 0:
>             x += 1 #still may be prime so go on trying
>         elif test == 0:
>             mayP += 1 #is not prime, test the second prime candidate
>         else:
>             pCount += 1 #increment counter
>             isP = mayP #may --> is
>             print "%u | %u" % (pCount, isP) #uhm... prints results?!

and of course the 'else' branch can NEVER be taken -- since test
is either different from or equal to zero, there IS no third case.

> Also are you aware of more sophisticated and efficient algos for prime
> testing implemented in Python?

David Eppstein has a neat Eratosthenes-sieve-based generator in
the online cookbook, at URL
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117119

Unfortunately the indents are messed up, at least as Opera shows
me the code.  So here's a comsmetically enhanced version (for
Python 2.3, showing off a minor enhancement;-):

def primes():
    q = 2                    # next candidate for primeness
    D = {}                   # map non-primes to their factors
    while True:              # infinite generator, terminate otherwise
        ps = D.pop(q, None)  # get & remove known factors (if any)
        if ps:               # has some factors -> it's not a prime
            for p in ps:     # move each factor to its next multiple
                D.setdefault(p+q, []).append(p)
        else:                # has no factors -> it's a prime
            D[q*q] = [q]     # mark the only newly-discovered multiple
            yield q          # yield each prime one after the other
        q += 1               # next candidate

if __name__ == '__main__':
    # example: print the first 20 primes
    for i, p in enumerate(primes()):
        print p,
        if i>=20: break

prints 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73

It may be more instructive to add a "print q, D" right after the
"while True:" loop header to show what's going on.  Here's the
result, with the loop limited to just 8 primes to avoid too much
output, and a clearer print "%d-th prime is %d" % (i+1, p) in
the main loop's body:

2 {}
1-th prime is 2
3 {4: [2]}
2-th prime is 3
4 {9: [3], 4: [2]}
5 {9: [3], 6: [2]}
3-th prime is 5
6 {9: [3], 6: [2], 25: [5]}
7 {8: [2], 9: [3], 25: [5]}
4-th prime is 7
8 {49: [7], 25: [5], 8: [2], 9: [3]}
9 {49: [7], 25: [5], 9: [3], 10: [2]}
10 {49: [7], 25: [5], 10: [2], 12: [3]}
11 {49: [7], 25: [5], 12: [3, 2]}
5-th prime is 11
12 {49: [7], 25: [5], 121: [11], 12: [3, 2]}
13 {49: [7], 25: [5], 121: [11], 14: [2], 15: [3]}
6-th prime is 13
14 {49: [7], 169: [13], 25: [5], 121: [11], 14: [2], 15: [3]}
15 {16: [2], 49: [7], 169: [13], 25: [5], 121: [11], 15: [3]}
16 {16: [2], 49: [7], 18: [3], 121: [11], 25: [5], 169: [13]}
17 {49: [7], 18: [3, 2], 121: [11], 25: [5], 169: [13]}
7-th prime is 17
18 {49: [7], 18: [3, 2], 121: [11], 25: [5], 169: [13], 289: [17]}
19 {49: [7], 121: [11], 21: [3], 25: [5], 169: [13], 289: [17], 20: [2]}
8-th prime is 19

we see that in this version having a list of known factors
corresponding to each nonprime that is a current key in dict D
is necessary to deal with numbers that have more than one
prime factor smaller than their square root, such as 12 and
18 in this small run.  Alternatively, therefore, one might
choose a tiny variant for D's values, keeping them to single
numbers but also ensuring that an entry is never duplicated:
it suffices to change the "while True:" loop's body to:

        p = D.pop(q, None) 
        if p:
            x = p + q
            while x in D: x += p
            D[x] = p
        else:
            D[q*q] = q
            yield q
        q += 1

and this proves to be a small but worthwhile optimization, e.g
with the following test code to check we're finding the smalles
prime larger than 100,000, and check our runtime:

import time

start=time.clock()
for p in primes():
    if p>100000: break
stend=time.clock()

print p
print stend-start

on my machine, the original version using a dict of lists takes
about 0.9 CPU seconds to come up with the result (100003), the
modified version using a dict with numbers as values takes about
0.5 CPU seconds.  Similarly, for the smallest prime larger than
1,000,000, we have 10.34 CPU seconds for the original version,
6.13 for the modified one -- again, a small but useful improvement.

Alex