List Count

Blind Anagram blindanagram at nowhere.org
Mon Apr 22 17:32:30 EDT 2013


On 22/04/2013 22:03, Oscar Benjamin wrote:
> On 22 April 2013 21:18, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>> On 22 April 2013 17:38, Blind Anagram <blindanagram at nowhere.org> wrote:
>>> On 22/04/2013 17:06, Oscar Benjamin wrote:
>>>
>>>> I don't know what your application is but I would say that my first
>>>> port of call here would be to consider a different algorithmic
>>>> approach. An obvious question would be about the sparsity of this data
>>>> structure. How frequent are the values that you are trying to count?
>>>> Would it make more sense to store a list of their indices?
>>>
>>> Actually it is no more than a simple prime sieve implemented as a Python
>>> class (and, yes, I realize that there are plenty of these around).
>>
>> If I understand correctly, you have a list of roughly a billion
>> True/False values indicating which integers are prime and which are
>> not. You would like to discover how many prime numbers there are
>> between two numbers a and b. You currently do this by counting the
>> number of True values in your list between the indices a and b.
>>
>> If my description is correct then I would definitely consider using a
>> different algorithmic approach. The density of primes from 1 to 1
>> billlion is about 5%. Storing the prime numbers themselves in a sorted
>> list would save memory and allow a potentially more efficient way of
>> counting the number of primes within some interval.
> 
> In fact it is probably quicker if you don't mind using all that memory
> to just store the cumulative sum of your prime True/False indicator
> list. This would be the prime counting function pi(n). You can then
> count the primes between a and b in constant time with pi[b] - pi[a].

I did wonder whether, after creating the sieve, I should simply go
through the list and replace the True values with a count.  This would
certainly speed up the prime count function, which is where the issue
arises.  I will try this and see what sort of performance trade-offs
this involves.

  Brian




More information about the Python-list mailing list