finding items that occur more than once in a list

Fri Mar 21 10:11:37 EDT 2008

On Mar 21, 4:17 am, Bryan Olson <fakeaddr... at nowhere.org> wrote:
> Arnaud Delobelle wrote:
> > Bryan Olson wrote:
> [...]
> >> Arnaud Delobelle offered a good Wikipedia link, and for more background
> >> look up "amortized analysis.
>
> > Hrvoje Niksic provided the link :).
>
> Oops, careless thread-following. Hrvoje Niksic it was.
>
> > I still think two unrelated
> > things are being compared in this thread when people say that method A
> > (using dictionaries / sets) is O(n) and method B (sorting the list)
> > O(nlogn).
>
> > Isn't it the case that:
>
> >          | Worst case | Average case
> > ---------|------------|-------------
> > Method A | O(n^2)     | O(n)
> > Method B | O(nlogn)   | O(nlogn)
>
> > Which method is the best would then be determined by the distribution
> > of the hash values of your data, and whether you want to have a
> > guarantee the method will have a less-than-quadratic behaviour.
>
> If we exclude the case where an adversary is choosing the keys, the
> chance of a seriously degenerate case in the hashing method is so
> remote that we do should not worry about it. Those who insist on
> ranking by worst-case behavior have probably abandoned Python long
> ago, as it uses those hashed 'dict' things everywhere. Of course
> they've also abandoned OS's with demand-paged virtual memory and
> such.
>
> --
> --Bryan

A collision sequence is not so rare.

>>> [ hash( 2**i ) for i in range( 0, 256, 32 ) ]
[1, 1, 1, 1, 1, 1, 1, 1]