collections.Counter surprisingly slow

Sun Jul 28 16:51:11 EDT 2013

On Sun, 28 Jul 2013 15:59:04 -0400, Roy Smith wrote:

[...]
> I'm rather shocked to discover that count() is the slowest
> of all!  I expected it to be the fastest.  Or, certainly, no slower than
> default().
> 
> The full profiler dump is at the end of this message, but the gist of it
> is:
> 
> ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>     1    0.000    0.000    0.322    0.322 ./stations.py:42(count)
>     1    0.159    0.159    0.159    0.159 ./stations.py:17(test)
>     1    0.114    0.114    0.114    0.114 ./stations.py:27(exception)
>     1    0.097    0.097    0.097    0.097 ./stations.py:36(default)
> 
> Why is count() [i.e. collections.Counter] so slow?

It's within a factor of 2 of test, and 3 of exception or default (give or 
take). I don't think that's surprisingly slow. In 2.7, Counter is written 
in Python, while defaultdict has an accelerated C version. I expect that 
has something to do with it.

Calling Counter ends up calling essentially this code:

for elem in iterable:
    self[elem] = self.get(elem, 0) + 1

(although micro-optimized), where "iterable" is your data (lines). 
Calling the get method has higher overhead than dict[key], that will also 
contribute.

-- 
Steven