collections.Counter surprisingly slow

Mon Jul 29 07:49:53 EDT 2013

On 29 July 2013 07:25, Serhiy Storchaka <storchaka at gmail.com> wrote:

> 28.07.13 22:59, Roy Smith написав(ла):
>
>    The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
>> unique strings).
>>
>
> Repeat you tests with totally unique lines.

Counter is about ½ the speed of defaultdict in that case (as opposed to ⅓).

>  The full profiler dump is at the end of this message, but the gist of
>> it is:
>>
>
> Profiler affects execution time. In particular it slowdown Counter
> implementation which uses more function calls. For real world measurement
> use different approach.

Doing some re-times, it seems that his originals for defaultdict, exception
and Counter were about right. I haven't timed the other.

>  Why is count() [i.e. collections.Counter] so slow?
>>
>
> Feel free to contribute a patch which fixes this "wart". Note that Counter
> shouldn't be slowdowned on mostly unique data.

I find it hard to agree that counter should be optimised for the
unique-data case, as surely it's much more oft used when there's a point to
counting?

Also, couldn't Counter just extend from defaultdict?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130729/5ee9d686/attachment.html>