Counting number of each item in a list.
Bruno Desthuilliers
bdesth.quelquechose at free.quelquepart.fr
Sun Mar 19 17:14:31 EST 2006
Bruno Desthuilliers a écrit :
> Kent Johnson a écrit :
>
>> sophie_newbie wrote:
>>
>>> Hey Bruno,
>>>
>>> I got an invalid syntax error when i tried using your "str_counts =
>>> dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
>>> there is a missing bracket or comma? Or maybe I need to import
>>> something.
>>
>>
>>
>> It should be
>> str_counts = dict((s, str_list.count(s)) for s in set(str_list))
>
>
> Of course, my bad :(
>
>> or for Python < 2.4
>
>
> from sets import Set as set
>
>> str_counts = dict([(s, str_list.count(s)) for s in set(str_list)])
>>
>> Note that this solution iterates str_list once for each element of
>> str_list
>
>
> once for each *distinct* element or str_list (it iterates over the set
> created from str_list).
>
>> - the call to count traverses the entire list to create the count. I
>> expect Paul Rubin's solution will be dramatically faster for large
>> lists as it only iterates str_list once.
>
> Yeps. But I would benchmark it before choosing one or the other solution.
And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large lists.
nb: A is mine, B is Paul's and C is bearophile's, and the number after
is the size of the list...
A100 (10000 times): 1.5801050663
B100 (10000 times): 1.87287902832
C100 (10000 times): 0.991976976395
A10000 (100 times): 1.083589077
B10000 (100 times): 1.30713891983
C10000 (100 times): 0.988032817841
A1000000 (10 times): 10.5345788002
B1000000 (10 times): 13.094493866
C1000000 (10 times): 9.67438292503
source:
def countA(lst):
return dict((s, lst.count(s)) for s in set(lst))
def countB(lst):
counts = {}
for s in lst:
counts[s] = counts.get(s, 0)+1
return counts
def countC(lst):
counts = {}
for s in lst:
if s in counts:
counts[s] += 1
else:
counts[s] = 1
return counts
def mklst(ln):
from random import choice
items = ['abc', 'def', 'ghi', 'jkl', 'mno',
'pqr', 'stu', 'vwx', 'yz']
return [choice(items) for i in range(ln)]
lst100 = mklst(100)
lst10000 = mklst(10000)
lst1000000 = mklst(1000000)
def run():
from timeit import Timer
timers = [
('A100',
Timer('countA(lst100)',
'from __main__ import countA, lst100'),
10000),
('B100',
Timer('countB(lst100)',
'from __main__ import countB, lst100'),
10000),
('C100',
Timer('countC(lst100)',
'from __main__ import countC, lst100'),
10000),
('A10000',
Timer('countA(lst10000)',
'from __main__ import countA, lst10000'),
100),
('B10000',
Timer('countB(lst10000)',
'from __main__ import countB, lst10000'),
100),
('C10000',
Timer('countC(lst10000)',
'from __main__ import countC, lst10000'),
100),
('A1000000',
Timer('countA(lst1000000)',
'from __main__ import countA, lst1000000'),
10),
('B1000000',
Timer('countB(lst1000000)',
'from __main__ import countB, lst1000000'),
10),
('C1000000',
Timer('countC(lst1000000)',
'from __main__ import countC, lst1000000'),
10),
]
for name, timer, repeat in timers:
print "%s (%s times): %s" % (name,
repeat,
timer.timeit(repeat))
More information about the Python-list
mailing list