Prime number module

Lulu of the Lotus-Eaters mertz at gnosis.cx
Mon Sep 29 17:20:21 EDT 2003


Stephen Horne <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote previously:
|I just wrote a fairly simple sieve, which gave primes up to 1,000,000
|in a few seconds (1.8GHz P4, Python code). There were 78498 primes in
|that range...at least in theory there should be less than 100 times as
|many primes in the range up to 10^8.

Quite a few less, actually.  Under Gauss' Prime Number Theorem, an
approximation for the number of primes less than N is N/ln(N).  I know
there have been some slight refinements in this estimate since 1792, but
in the ranges we're talking about, it's plenty good enough.

So I only expect around 5,428,681 primes less than 10^8 to occur.  Well,
that's not SO much less than 7.8M.

|So here's the thought - a binary file containing a complete list of
|primes up to 10^8 would require roughly 30MB (using 32 bit integers,
|which should comfortably handle your requirement).

I wonder if such a data structure is really necessary.  Certainly it
produces a class of answers quite quickly.  I.e. search for a prime, and
its offset immediately gives you the number of primes less than it.
Heck, searching for any number, even a composite occurring between
primes, works pretty much the same way.  Of course, the above
approximation gives you a close answer even quicker.

But if you are worried about disk storage, one easy shortcut is to store
a collection of 16-bit differences between successive primes.  That's
half the size, and still lets you answer the desired question *pretty*
quickly (extra addition is required)... or at least generate a local
copy of Horne's data structure in one pass.

Moving farther, even this gap structure is quite compressible.  Most
gaps are quite a bit smaller than 65536, so the highbits are zeros.  In
fact, I am pretty sure that almost all the gaps are less than 256.  So
an immediate compression strategy (saving disk space, costing time to
recreate the transparent structure) is to store gaps as 8-bit values,
with a x00 byte escaping into a larger value (I guess in the next two
bytes).

Maybe I'll try it, and see how small I can make the data... unless I do
my real work :-).

Yours, Lulu...

--
Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons.  Intellectual
property is to the 21st century what the slave trade was to the 16th.






More information about the Python-list mailing list