List Count

Tue Apr 23 21:59:34 EDT 2013

On Tue, 23 Apr 2013 17:57:17 +0100, Blind Anagram wrote:

> On 23/04/2013 15:49, Steven D'Aprano wrote:
>> On Tue, 23 Apr 2013 08:05:53 +0100, Blind Anagram wrote:
>> 
>>> I did a lot of work comparing the overall performance of the sieve
>>> when using either lists or arrays and I found that lists were a lot
>>> faster for the majority use case when the sieve is not large.
>> 
>> And when the sieve is large?
> 
> I don't know but since the majority use case is when the sieve is small,
> it makes sense to choose a list.
> 
>> I don't actually believe that the bottleneck is the cost of taking a
>> list slice. That's pretty fast, even for huge lists, and all efforts to
>> skip making a copy by using itertools.islice actually ended up slower.
>> But suppose it is the bottleneck. Then *sooner or later* arrays will
>> win over lists, simply because they're smaller.
> 
> Maybe you have not noticed that, when I am discussing a huge sieve, I am
> simply pushing a sieve designed primarily for a small sieve lengths to
> the absolute limit.  This is most definitely a minority use case.

You don't say? I hadn't noticed the first three hundred times you 
mentioned it :-P

In my opinion, it is more important to be efficient for large sieves, not 
small. As they say, for small N, everything is fast. Nobody is going to 
care about the difference between small-N taking 1ms or 10ms, but they 
will care about the difference between big-N taking 1 minute or 1 hour. 
So, as a general rule, optimize the expensive cases, and if the cheap 
cases are a little less cheap than they otherwise would have been, nobody 
will care or notice.

Of course, it's your code, and you're free to make whatever trade-offs 
between time and space and programmer effort that you feel are best. I'm 
just making a suggestion.

[...]
> In other words, the lack of a list.count(value, limit) function makes
> Python less effective than it would otherwise be.  I haven't looked at
> Python's C code base but I still wonder if there a good reason for NOT
> providing this?

The same reasons for not providing any other of an infinite number of 
potential features. You have to weigh up the potential benefits of the 
feature against the costs:

- Is this a good use of developer time to build the feature?

- Every new feature requires somebody to write it, somebody to check it, 
somebody to write tests for it, somebody to document it. Who is going to 
do all this work?

- Every new feature increases the size and complexity of the language and 
makes it harder for people to learn.

- Does this new feature actually have the advantages claimed? Many so-
called optimizations actually turn out to be pessimizations instead.

- Will the new feature be an attractive nuisance, fooling programmers 
into using it inappropriately?

In this case, I would say that adding a limit argument to list.count is 
*absolutely not* worthwhile, because it is insufficiently general. To be 
general enough to be worthwhile, you would have to add three arguments:

list.count(value, start=0, end=None, step=1)

But even there I don't think it's general enough to cover the costs. I 
believe that a better solution would be to create list views to offer a 
subset of the list without actually copying.

-- 
Steven