Eliminating duplicates entries from a list efficiently

François Pinard pinard at iro.umontreal.ca
Fri Jul 2 22:56:12 EDT 2004


[Roy Smith]
> [Paul]

> > Can anyone suggest an efficient way to eliminate duplicate entries
> > in a list?  The naive approach below works fine, but is very slow
> > with lists containing tens of thousands of entries:

> Something like:

> d = {}
> for item in list:
>    d[item] = True
> list = d.keys()

> This should do it and will run in, if I'm not mistaken, O(n).  The
> only problem is that it scrambles the order of the items in the list,
> which may or may not be a problem, depending on your application.

If order is not important, here is a short idiom:

    from sets import Set as set
    d = list(set(d))

This may be slower than the dictionary solution in current Python.  Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases.  So, it might be
a good bet, learning to depend on them in the long run.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard




More information about the Python-list mailing list