List Count

Tue Apr 23 03:00:50 EDT 2013

On 23/04/2013 00:28, Steven D'Aprano wrote:
> On Mon, 22 Apr 2013 22:25:50 +0100, Blind Anagram wrote:
> 
>> I have looked at solutions based on listing primes and here I have found
>> that they are very much slower than my existing solution when the sieve
>> is not large (which is the majority use case).
> 
> Yes. This is hardly surprising. Algorithms suitable for dealing with the 
> first million primes are not suitable for dealing with the first trillion 
> primes, and vice versa. We like to pretend that computer programming is 
> an abstraction, and for small enough data we often can get away with 
> that, but like all abstractions eventually it breaks and the cost of 
> dealing with real hardware becomes significant.
> 
> But I must ask, given that the primes are so widely distributed, why are 
> you storing them in a list instead of a sparse array (i.e. a dict)? There 
> are 50,847,534 primes less than or equal to 1,000,000,000, so you are 
> storing roughly 18 False values for every True value. That ratio will 
> only get bigger. With a billion entries, you are using 18 times more 
> memory than necessary.

Because the majority use case for my Prime class is for a sieve that is
not large.  I am just pushing the envelope for a minority use case so
that it still works for huge sieves, albeit inefficiently.

I accept it is inefficient, but the fact remains that I can produce a
sieve that can yield and count a billion primes in a reasonable time but
this fails when I wish to count on a part of the sieve because this can
double the memory requirement for the lack of a list.count(value, limit)
function.

I would not dream of doing this job by copying a slice in any other
language that I have used so I was simply asking for advice to discover
whether this copy could be avoided whilst staying with the simple sieve
design.

Thank you for your input.

   Brian