finding items that occur more than once in a list

Bryan Olson fakeaddress at nowhere.org
Sat Mar 22 04:31:45 EDT 2008


castironpi at gmail.com wrote:
> John Machin wrote:
>> On Mar 22, 1:11 am, castiro... at gmail.com wrote:
>>> A collision sequence is not so rare.
>>>>>> [ hash( 2**i ) for i in range( 0, 256, 32 ) ]
>>> [1, 1, 1, 1, 1, 1, 1, 1]

>> Bryan did qualify his remarks: "If we exclude the case where an
>> adversary is choosing the keys, ..."
> 
> Some adversary.  What, you mean, my boss or my customers?

We mean that the party supplying the keys deliberately chose
them to make the hash table inefficient. In this thread the goal
is efficiency; a party working toward an opposing goal is an
adversary.

If you find real-world data sets that tend to induce bad-case
behavior in Python's hash, please do tell. It would be reason
enough to adjust the hash function. The hashes in popular
software such as Python are already quite well vetted.

Even a hash function that behaves as a random oracle has
worst-case quadratic-time in the algorithm here, but that's
an issue in theory and not in serving customers.


-- 
--Bryan



More information about the Python-list mailing list