[Tutor] Select distinct item form list

Gregor Lingl glingl@aon.at
Mon Feb 24 05:32:02 2003


janos.juhasz@VELUX.com schrieb:

>Dear All,
>
>Can someone show me a simple list comprehension to do the same thing as
>"Select distinct item form list" does in SQL
>
>So i have a list
>  
>
>>>>l = (1,2,3,4,5,5,6,7,7,7,2)
>>>>        
>>>>
>but i would have just
>l=(1,2,3,4,5,6,7)
>
>I know i have seen this somewhere, but i cannot find it :(
>  
>
Me neither! And I also can't figure out a clear and fast example.

But here are two functions, which do what you need, the second one beeing
*much* faster than the first one, although it uses a kind of detour:

 >>> def uniques(list):
...     u=[]
...     for l in list:
...         if l not in u:
...             u.append(l)
...     return u
...
 >>> def distincts(list):
...     d = {}
...     for l in list:
...         d[l]=None
...     return d.keys()
...
 >>> from time import clock
 >>> from random import randrange
 >>> example = [randrange(100) for i in range(1000)]
 >>> if 1:
...     a=clock()
...     result1 = uniques(example)
...     b=clock()
...     result2 = distincts(example)
...     c=clock()
...     print b-a, c-b
...    
0.0271014815845 0.00253942818447
 >>> example = [randrange(1000) for i in range(1000)]
 >>> if 1:
...     a=clock()
...     result1 = uniques(example)
...     b=clock()
...     result2 = distincts(example)
...     c=clock()
...     print b-a, c-b
...
0.15894725197 0.00390971368995
 >>> len(result1)
637
 >>> len(result2)
637
 >>> result1
[798, 106, 230, 694, 163, 709, 666, 29, 481, 115, 682, 467, 872, 195, 
311, 800, 420, 423, 881, ...

Regards, Gregor

P.S.: The following is also possible, but certainly not what you had in 
mind:

 >>> def weird(list):
...     e = []
...     u = [e.append(l) for l in list if l not in e]
...     return e
...
 >>> result3 = weird(example)
 >>> result3[:20]
[798, 106, 230, 694, 163, 709, 666, 29, 481, 115, 682, 467, 872, 195, 
311, 800, 420, 423, 881, 8]
 >>> len(result3)
637

OOPS! Michael's idea just arrived (including a typo):

 >>> def uniqs(inp, was_there=[]):
...   if not inp in was_there:
...     was_there.append(inp)
...     return 1  # sending "True" to filter
...   
 >>> if 1:
...     a = clock()
...     result4 = filter(uniqs, example)
...     b = clock()
...     print b - a, len(result4)
...    
0.167666793498 637

It uses a similar idea to my first example.
!!! But ti has a severe disadvantage as you can see,
if you use it twice: !!!

 >>> if 1:
...     a = clock()
...     result4 = filter(uniqs, example)
...     b = clock()
...     print b - a, len(result4)
...
0.168840126653 0
 >>> result4
[]
 >>>

Now the resulting list is empty! This comes from using
a mutable object, namely a list, as default value for a parameter.
After the first run was_there contains all the numbers in example,
so nothing will be added to result.

So, I think, it's better to discard that idea. Sorry.




>Please CC me.
>Best regards,
>-----------------------
>Juhász János
>IT department
>
>
>
>_______________________________________________
>Tutor maillist  -  Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor
>
>
>  
>