Prime number module

Tim Hochberg tim.hochberg at ieee.org
Mon Sep 29 19:55:21 EDT 2003


Lulu of the Lotus-Eaters wrote:
> Stephen Horne <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote previously:
> |I just wrote a fairly simple sieve, which gave primes up to 1,000,000
> |in a few seconds (1.8GHz P4, Python code). There were 78498 primes in
> |that range...at least in theory there should be less than 100 times as
> |many primes in the range up to 10^8.
> 
> Quite a few less, actually.  Under Gauss' Prime Number Theorem, an
> approximation for the number of primes less than N is N/ln(N).  I know
> there have been some slight refinements in this estimate since 1792, but
> in the ranges we're talking about, it's plenty good enough.
> 
> So I only expect around 5,428,681 primes less than 10^8 to occur.  Well,
> that's not SO much less than 7.8M.
> 
> |So here's the thought - a binary file containing a complete list of
> |primes up to 10^8 would require roughly 30MB (using 32 bit integers,
> |which should comfortably handle your requirement).
> 
> I wonder if such a data structure is really necessary.  Certainly it
> produces a class of answers quite quickly.  I.e. search for a prime, and
> its offset immediately gives you the number of primes less than it.
> Heck, searching for any number, even a composite occurring between
> primes, works pretty much the same way.  Of course, the above
> approximation gives you a close answer even quicker.
> 
> But if you are worried about disk storage, one easy shortcut is to store
> a collection of 16-bit differences between successive primes.  That's
> half the size, and still lets you answer the desired question *pretty*
> quickly (extra addition is required)... or at least generate a local
> copy of Horne's data structure in one pass.
> 
> Moving farther, even this gap structure is quite compressible.  Most
> gaps are quite a bit smaller than 65536, so the highbits are zeros.  In
> fact, I am pretty sure that almost all the gaps are less than 256.  So
> an immediate compression strategy (saving disk space, costing time to
> recreate the transparent structure) is to store gaps as 8-bit values,
> with a x00 byte escaping into a larger value (I guess in the next two
> bytes).
> 
> Maybe I'll try it, and see how small I can make the data... unless I do
> my real work :-).

I believe you could implement a hybrid scheme that would be quite fast 
and still maintain nearly the same level of compression that you 
describe above. In addition to the above compressed data, also store, 
uncompressed, every Nth prime. A binary search will get you within N 
primes of your answer, to find the exact value, recreate those N-primes. 
For a N of, for instance, 64 the level of compression would be minimally 
affected but should make finding the number of primes less than a given 
number number much faster than the basic compressed scheme.

In fact I wouldn't be suprised if this was faster than the uncompressed 
scheme since you're less likely to thrash your memory.

-tim

> Yours, Lulu...
> 
> --
> Keeping medicines from the bloodstreams of the sick; food from the bellies
> of the hungry; books from the hands of the uneducated; technology from the
> underdeveloped; and putting advocates of freedom in prisons.  Intellectual
> property is to the 21st century what the slave trade was to the 16th.
> 
> 





More information about the Python-list mailing list