Uniquifying a list?

Tue Apr 18 20:13:07 EDT 2006

Em Ter, 2006-04-18 às 10:31 -0500, Tim Chase escreveu:
> Is there an obvious/pythonic way to remove duplicates from a 
> list (resulting order doesn't matter, or can be sorted 
> postfacto)?  My first-pass hack was something of the form
> 
>  >>> myList = [3,1,4,1,5,9,2,6,5,3,5]
>  >>> uniq = dict([k,None for k in myList).keys()
> 
> or alternatively
> 
>  >>> uniq = list(set(myList))
> 
> However, it seems like there's a fair bit of overhead 
> here...creating a dictionary just to extract its keys, or 
> creating a set, just to convert it back to a list.  It feels 
> like there's something obvious I'm missing here, but I can't 
> put my finger on it.

Your list with 11 elements (7 unique):

$ python2.4 -mtimeit -s 'x = [3,1,4,1,5,9,2,6,5,3,5]' 'y = dict((k,None)
for k in x).keys()'
100000 loops, best of 3: 8.01 usec per loop

$ python2.4 -mtimeit -s 'x = [3,1,4,1,5,9,2,6,5,3,5]' $'y = []\nfor i in
x:\n    if i not in y:\n        y.append(i)'
100000 loops, best of 3: 5.43 usec per loop

$ python2.4 -mtimeit -s 'x = [3,1,4,1,5,9,2,6,5,3,5]' 'y = list(set(x))'
100000 loops, best of 3: 3.57 usec per loop

A list with 100 000 elements (1 000 unique):

$ python2.4 -mtimeit -s 'x = range(1000) * 100' $'y = []\nfor i in x:\n
if i not in y:\n        y.append(i)'
10 loops, best of 3: 2.12 sec per loop

$ python2.4 -mtimeit -s 'x = range(1000) * 100' 'y = dict((k,None) for k
in x).keys()'
10 loops, best of 3: 32.2 msec per loop

$ python2.4 -mtimeit -s 'x = range(1000) * 100' 'y = list(set(x))'
100 loops, best of 3: 6.09 msec per loop

"list(set(x))" is the clear winner with almost O(1) performance.
*However*, can't you always use "set" or "frozenset" instead of
converting back and forth?

HTH,

-- 
Felipe.