List Count

Tue Apr 23 14:30:11 EDT 2013

On 23/04/2013 18:45, Oscar Benjamin wrote:

>>>> I did a lot of work comparing the overall performance of the sieve when
>>>> using either lists or arrays and I found that lists were a lot faster
>>>> for the majority use case when the sieve is not large.
>>>
>>> And when the sieve is large?
>>
>> I don't know but since the majority use case is when the sieve is small,
>> it makes sense to choose a list.
> 
> That's an odd comment given what you said at the start of this thread:
> 
> Blind Anagram wrote:
>> I would be grateful for any advice people can offer on the fastest way
>> to count items in a sub-sequence of a large list.
>>
>> I have a list of boolean values that can contain many hundreds of
>> millions of elements for which I want to count the number of True values
>> in a sub-sequence, one from the start up to some value (say hi).

At this early stage in the discussion I was simply explaining the
immediate context of the problem on which I was seeking advice.

Here I didn't think it was necessary to expand on the wider context
since there might have been an very easy way that people could suggest
for avoiding the slice copy when counting on a part of a list.

But there isn't, so the wider details then became important in
explaining why some proposals might work for the limiting case but would
not make sense within the overall context of use.  And here I have said
on more than one occasion that the huge sieve case is a minority use case.

[snip]
> You keep mentioning that you want it to work with a large sieve. I
> would much rather compute the same quantities with a small sieve if
> possible. If you were using the Lehmer/Meissel algorithm you would be
> able to compute the same quantity (i.e. pi(1e9)) using a much smaller
> sieve with 30k items instead of 1e9. that would fit *very* comfortably
> in memory and you wouldn't even need to slice the list. Or to put it
> another way, you could compute pi(~1e18) using your current sieve
> without slicing or paging. If you want to lift the limit on computing
> pi(x) this is clearly the way to go.

If prime_pi for huge numbers was really important to me I wouldn't be
using Python!

This is just one of a number of functions implemented in the class.  It
is nice to have and it is also nice for testing purposes to be able to
run it for large sieves. But it is by no means important enough to
devote dedicated code to computing it. The prime generator within the
class is far more important and is the workhorse for most uses

[snip]
>> In other words, the lack of a list.count(value, limit) function makes
>> Python less effective than it would otherwise be.  I haven't looked at
>> Python's C code base but I still wonder if there a good reason for NOT
>> providing this?
> 
> If you feel that this is a good suggestion for an improvement to
> Python consider posting it on python-ideas. I wasn't aware of the
> equivalent functionality on strings but I see that the tuple.count()
> function is the same as list.count().

To be honest, I am not really in a position to judge whether this is a
'good' suggestion.  It has turned up as potentially useful in two cases
for me but one of these (the one discussed here) is a minority use case.
 I would also be a bit worried about launching this into a group of
Python experts who will almost certainly be even more demanding of
explanations than you folk here!

   Brian