List Count

Tue Apr 23 03:02:26 EDT 2013

On 23/04/2013 02:47, Dave Angel wrote:
> On 04/22/2013 05:32 PM, Blind Anagram wrote:
>> On 22/04/2013 22:03, Oscar Benjamin wrote:
>>> On 22 April 2013 21:18, Oscar Benjamin <oscar.j.benjamin at gmail.com>
>>> wrote:
>>>> On 22 April 2013 17:38, Blind Anagram <blindanagram at nowhere.org> wrote:
>>>>> On 22/04/2013 17:06, Oscar Benjamin wrote:
>>>>>
>>>>>> I don't know what your application is but I would say that my first
>>>>>> port of call here would be to consider a different algorithmic
>>>>>> approach. An obvious question would be about the sparsity of this
>>>>>> data
>>>>>> structure. How frequent are the values that you are trying to count?
>>>>>> Would it make more sense to store a list of their indices?
>>>>>
>>>>> Actually it is no more than a simple prime sieve implemented as a
>>>>> Python
>>>>> class (and, yes, I realize that there are plenty of these around).
>>>>
>>>> If I understand correctly, you have a list of roughly a billion
>>>> True/False values indicating which integers are prime and which are
>>>> not. You would like to discover how many prime numbers there are
>>>> between two numbers a and b. You currently do this by counting the
>>>> number of True values in your list between the indices a and b.
>>>>
>>>> If my description is correct then I would definitely consider using a
>>>> different algorithmic approach. The density of primes from 1 to 1
>>>> billlion is about 5%. Storing the prime numbers themselves in a sorted
>>>> list would save memory and allow a potentially more efficient way of
>>>> counting the number of primes within some interval.
>>>
>>> In fact it is probably quicker if you don't mind using all that memory
>>> to just store the cumulative sum of your prime True/False indicator
>>> list. This would be the prime counting function pi(n). You can then
>>> count the primes between a and b in constant time with pi[b] - pi[a].
>>
>> I did wonder whether, after creating the sieve, I should simply go
>> through the list and replace the True values with a count.  This would
>> certainly speed up the prime count function, which is where the issue
>> arises.  I will try this and see what sort of performance trade-offs
>> this involves.
>>
> 
> By doing that replacement, you'd increase memory usage manyfold (maybe
> 3:1, I don't know).  As long as you're only using bools in the list, you
> only have the list overhead to consider, because all the objects
> involved are already cached (True and False exist only once each).  If
> you have integers, you'll need a new object for each nonzero count.

Thank you, Dave, you have answered a question that I was going to ask
before I even asked it!