Eliminating duplicates entries from a list efficiently
François Pinard
pinard at iro.umontreal.ca
Fri Jul 2 22:56:12 EDT 2004
[Roy Smith]
> [Paul]
> > Can anyone suggest an efficient way to eliminate duplicate entries
> > in a list? The naive approach below works fine, but is very slow
> > with lists containing tens of thousands of entries:
> Something like:
> d = {}
> for item in list:
> d[item] = True
> list = d.keys()
> This should do it and will run in, if I'm not mistaken, O(n). The
> only problem is that it scrambles the order of the items in the list,
> which may or may not be a problem, depending on your application.
If order is not important, here is a short idiom:
from sets import Set as set
d = list(set(d))
This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.
--
François Pinard http://www.iro.umontreal.ca/~pinard
More information about the Python-list
mailing list