[Tutor] Looking for duplicates within a list

Peter Otten __peter__ at web.de
Sat Jun 12 13:19:37 CEST 2010


Ken G. wrote:

> I have been working on this problem for several days and I am not making
> any progress.  I have a group of 18 number, in ascending order, within a
> list.  They ranged from 1 to 39.  Some numbers are duplicated as much as
> three times or as few as none.
> 
> I started with one list containing the numbers.  For example, they are
> listed as like below:
> 
> a = [1, 2, 3, 3, 4]
> 
> I started off with using a loop:
> 
>     for j in range (0, 5):
>     x = a[0] # for example, 1
> 
> How would I compare '1' with 2, 3, 3, 4?
> 
> Do I need another duplicated list such as b = a and compare a[0] with
> either b[0], b[1], b[2], b[3], b[4]?
> 
> Or do I compare a[0] with a[1], a[2], a[3], a[4]?
> 
> In any event, if a number is listed more than once, I would like to know
> how many times, such as 2 or 3 times.  For example, '3' is listed twice
> within a list.

I haven't seen a solution that takes advantage of the fact that the numbers 
are sorted, so here's one:

>>> items = [1, 2, 3, 3, 4]
>>> value = items[0]       
>>> n = 0
>>> result = []
>>> for item in items:
...     if value != item:
...             print value, "-->", n
...             value = item
...             n = 1       
...     else:               
...             n += 1      
...                         
1 --> 1                     
2 --> 1
3 --> 2
>>> print value, "-->", n
4 --> 1

Python has a cool feature called generator that allows you to encapsulate 
the above nicely:

>>> def count_runs(items):
...     it = iter(items)
...     value = next(it)
...     n = 1 # next consumes one item
...     for item in it: # the for loop starts with the second item
...             if value != item:
...                     yield value, n
...                     value = item
...                     n = 1
...             else:
...                     n += 1
...     yield value, n
...
>>> for v, n in count_runs([1, 1, 1, 2, 2, 3, 4, 4, 4, 4]):
...     print v, "-->", n
...
1 --> 3
2 --> 2
3 --> 1
4 --> 4

Finally, there is itertools.groupby() which allows you to simplify the 
implementation count_runs():

>>> from itertools import groupby
>>> def count_runs2(items):
...     for key, group in groupby(items):
...             yield key, sum(1 for _ in group)
...
>>> for v, n in count_runs2([1, 1, 1, 2, 2, 3, 4, 4, 4, 4]):
...     print v, "-->", n
...
1 --> 3
2 --> 2
3 --> 1
4 --> 4

Peter




More information about the Tutor mailing list