Counting number of each item in a list.

Bruno Desthuilliers bdesth.quelquechose at free.quelquepart.fr
Sun Mar 19 17:14:31 EST 2006


Bruno Desthuilliers a écrit :
> Kent Johnson a écrit :
> 
>> sophie_newbie wrote:
>>
>>> Hey Bruno,
>>>
>>> I got an invalid syntax error when i tried using your "str_counts =
>>> dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
>>> there is a missing bracket or comma? Or maybe I need to import
>>> something.
>>
>>
>>
>> It should be
>> str_counts = dict((s, str_list.count(s)) for s in set(str_list))
> 
> 
> Of course, my bad :(
> 
>> or for Python < 2.4
> 
> 
> from sets import Set as set
> 
>> str_counts = dict([(s, str_list.count(s)) for s in set(str_list)])
>>
>> Note that this solution iterates str_list once for each element of 
>> str_list
> 
> 
> once for each *distinct* element or str_list (it iterates over the set 
> created from str_list).
> 
>> - the call to count traverses the entire list to create the count. I 
>> expect Paul Rubin's solution will be dramatically faster for large 
>> lists as it only iterates str_list once. 
> 
> Yeps. But I would benchmark it before choosing one or the other solution.

And of course, I was right. My solution seems to be faster than Paul's 
one (but slower than bearophile's), be it on small, medium or large lists.

nb: A is mine, B is Paul's and C is bearophile's, and the number after 
is the size of the list...

A100 (10000 times): 1.5801050663
B100 (10000 times): 1.87287902832
C100 (10000 times): 0.991976976395
A10000 (100 times): 1.083589077
B10000 (100 times): 1.30713891983
C10000 (100 times): 0.988032817841
A1000000 (10 times): 10.5345788002
B1000000 (10 times): 13.094493866
C1000000 (10 times): 9.67438292503


source:

def countA(lst):
     return dict((s, lst.count(s)) for s in set(lst))

def countB(lst):
     counts = {}
     for s in lst:
         counts[s] = counts.get(s, 0)+1
     return counts

def countC(lst):
     counts = {}
     for s in lst:
         if s in counts:
             counts[s] += 1
         else:
             counts[s] = 1
     return counts

def mklst(ln):
     from random import choice
     items = ['abc', 'def', 'ghi', 'jkl', 'mno',
              'pqr', 'stu', 'vwx', 'yz']
     return [choice(items) for i in range(ln)]

lst100 = mklst(100)
lst10000 = mklst(10000)
lst1000000 = mklst(1000000)

def run():
     from timeit import Timer

     timers = [
         ('A100',
          Timer('countA(lst100)',
                'from __main__ import countA, lst100'),
          10000),
         ('B100',
          Timer('countB(lst100)',
                'from __main__ import countB, lst100'),
          10000),
         ('C100',
          Timer('countC(lst100)',
                'from __main__ import countC, lst100'),
          10000),
         ('A10000',
          Timer('countA(lst10000)',
                'from __main__ import countA, lst10000'),
          100),
         ('B10000',
          Timer('countB(lst10000)',
                'from __main__ import countB, lst10000'),
          100),
         ('C10000',
          Timer('countC(lst10000)',
                'from __main__ import countC, lst10000'),
          100),
         ('A1000000',
          Timer('countA(lst1000000)',
                'from __main__ import countA, lst1000000'),
          10),
         ('B1000000',
          Timer('countB(lst1000000)',
                'from __main__ import countB, lst1000000'),
          10),
         ('C1000000',
          Timer('countC(lst1000000)',
                'from __main__ import countC, lst1000000'),
          10),
         ]

     for name, timer, repeat in timers:
         print "%s (%s times): %s" % (name,
                                      repeat,
                                      timer.timeit(repeat))







More information about the Python-list mailing list