listerator clonage

Sat Feb 12 17:33:31 EST 2005

[Cyril BAZIN]
> I want to build a function which return values which appear two or
> more times in a list:
>
> So, I decided to write a little example which doesn't work:
> #l = [1, 7, 3, 4, 3, 2, 1]
> #i = iter(l)
> #for x in i:
> #    j = iter(i)
> #    for y in j:
> #        if x == y:
> #            print x

Here's one way:

s = [1, 7, 3, 4, 3, 2, 1]
seen = []
dups = []
for elem in s:
    if elem in seen:
         if elem not in dups:
             dups.append(elem)
    else:
         seen.append(elem)
print dups

This snippet has the virtue of not outputting the duplicate value more than once
and it does not rely on the values being hashable or ordered.   It has the vice
of running at O(n**2) speed.  A further property is that the values are output
in the order seen.

If the values are ordered but not hashable, itertools.groupby offers a faster
O(n lg n) solution that outputs the values in sorted order:

import itertools
print [k for k, g in itertools.groupby(sorted(s)) if len(list(g))>1]

If the values are known to be hashable, dictionaries offer an even faster O(n)
solution that outputs the values in arbitrary order:

bag = {}
for elem in s:
    bag[elem] = bag.get(elem, 0) + 1
print [elem for elem, count in bag.iteritems() if count>1]

Raymond Hettinger

P.S.  Extra credit problem:  make the itertools solution output values in the
order seen.