Removing duplicates from a list

Steven D'Aprano steve at REMOVETHIScyber.com.au
Wed Sep 14 11:26:54 EDT 2005


On Wed, 14 Sep 2005 13:28:58 +0100, Will McGugan wrote:

> Rubinho wrote:
>> I can't imagine one being much faster than the other except in the case
>> of a huge list and mine's going to typically have less than 1000
>> elements.  
> 
> I would imagine that 2 would be significantly faster. 

Don't imagine, measure.

Resist the temptation to guess. Write some test functions and time the two
different methods. But first test that the functions do what you expect:
there is no point having a blindingly fast bug.


> Method 1 uses 
> 'count' which must make a pass through every element of the list, which 
> would be slower than the efficient hashing that set does. 

But count passes through the list in C and is also very fast. Is that
faster or slower than the hashing code used by sets? I don't know, and
I'll bet you don't either.


-- 
Steven.




More information about the Python-list mailing list