Prime number module

Lulu of the Lotus-Eaters mertz at gnosis.cx
Tue Sep 30 17:19:58 EDT 2003


Taking up Bengt's suggestion, and some various others, I created a bit
array version of the primality data structure.  That is, each bit of the
file 'primes.bitarray' encodes the primality of the number corresponding
to its bit position.  However, only ODD numbers are mentioned in this
file, since it's rather easy to rule out evens (and special case 2).
This produces a 6.25 mB file for the first 10**8 numbers.

I was disappointed that gzip only reduces the size by 41%.  I would have
guessed better, given the predominance of 0 bits.  Still, no reason not
to use the [gzip] module to read the data.  You can download the data
file, if you wish, at:

    http://gnosis.cx/download/primes.bitarray.gz

I decided to check how quickly I can check primality using this data
structure.  Pretty well, to my mind (for reference, I have a PIII
733Mhz; nothing very fast for nowadays).  Here's the code:

    #!/usr/bin/python
    import gzip

    bit_source = gzip.open('primes.bitarray.gz')
    prime_bits = bit_source.read(100000)

    def isPrime(N):
        "Check primality on pre-generated bit field (read more as needed)"
        if N < 2:       return False
        elif N==2:      return True
        elif not N % 2: return False
        else:
            # Check if we need to extend prime_bits
            global prime_bits
            if 8*len(prime_bits) < N:
                needed = 1+(N//8) - len(prime_bits)
                prime_bits += bit_source.read(needed)
            byte, bit = divmod(N, 16)
            bit = 7 - bit//2
            return ord(prime_bits[byte]) & 2**bit

    if __name__=='__main__':
        from time import clock
        from random import randint
        # generate a million numbers to check
        tests = []
        for _ in xrange(10**6):
            tests.append(randint(1,10**8))
        start = clock()
        primes = 0
        for n in tests:
            if isPrime(n): primes += 1
        print "%d primes in 1e6 numbers (%.2f secs)" % (primes,clock()-start)

Here's the timing:

    % test_bits.py
    57546 primes in 1e6 numbers (29.57 secs)

I figured there was no need to read the whole data file in unless a
check required it--I just read in a smallish 100k at initialization.  Of
course, in the course of a million tests, I am pretty sure I wind up
reading in essentially the whole file at some point.

One nice thing to do would be to make 'bit_source' a bit smarter.  I.e.
it could be a class that generated more primes when needed (maybe using
gmpy as Alex suggests).  In principle, 'isPrime()' could then check
unbounded primes (subject to time and resource constraints).

The next thing would probably be to store sparce accumulations as Bengt
suggested (and Alex implemented in a different fashion).  That is,
another structure could store "every 1000th prime" in order to answer
questions like "how-many-less-than" and "which-is-Nth-prime".

Maybe I'll Emile's bitcounting techniques to do this.

Yours, Lulu...

--
mertz@  | The specter of free information is haunting the `Net!  All the
gnosis  | powers of IP- and crypto-tyranny have entered into an unholy
.cx     | alliance...ideas have nothing to lose but their chains.  Unite
        | against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------






More information about the Python-list mailing list