finding items that occur more than once in a list

Bryan Olson fakeaddress at nowhere.org
Fri Mar 21 05:17:10 EDT 2008


Arnaud Delobelle wrote:
> Bryan Olson wrote:
[...]
>> Arnaud Delobelle offered a good Wikipedia link, and for more background
>> look up "amortized analysis.
> 
> Hrvoje Niksic provided the link :). 

Oops, careless thread-following. Hrvoje Niksic it was.

> I still think two unrelated
> things are being compared in this thread when people say that method A
> (using dictionaries / sets) is O(n) and method B (sorting the list)
> O(nlogn).
> 
> Isn't it the case that:
> 
>          | Worst case | Average case
> ---------|------------|-------------
> Method A | O(n^2)     | O(n)
> Method B | O(nlogn)   | O(nlogn)
> 
> Which method is the best would then be determined by the distribution
> of the hash values of your data, and whether you want to have a
> guarantee the method will have a less-than-quadratic behaviour.

If we exclude the case where an adversary is choosing the keys, the
chance of a seriously degenerate case in the hashing method is so
remote that we do should not worry about it. Those who insist on
ranking by worst-case behavior have probably abandoned Python long
ago, as it uses those hashed 'dict' things everywhere. Of course
they've also abandoned OS's with demand-paged virtual memory and
such.


-- 
--Bryan



More information about the Python-list mailing list