[Python-Dev] Counting collisions for the win

Glenn Linderman v+python at g.nevcal.com
Mon Jan 23 21:15:36 CET 2012


On 1/23/2012 10:58 AM, Frank Sievertsen wrote:
>
>
> On 23.01.2012 19:25, Glenn Linderman wrote:
>> So this sounds like SafeDict, but putting it under the covers and 
>> automatically converting from dict to SafeDict after a collision 
>> threshold has been reached.  Let's call it fallback-dict.
>>
>> and costs:
>>
>> * converting the dict from one hash to the other by rehashing all the 
>> keys.
>
> That's not exactly what it does, it calls the randomized hash-function 
> only for those
> keys, that that didn't find a free slot after 20 collision. And it 
> uses this value only for
> the further collision resolution.
>
> So the value of hash() is used for the first 20 slots, 
> randomized_hash() is used
> after that.
>
> 1st try: slot[i = perturb = hash];
> 2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)]
> 3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)]
> ....
> 20th try: slot[i= i * 5 + 1 + (perturb >>= 5)]
> 21th try: slot[i= perturb = randomized_hash(key)] <---- HERE
> 22th try: slot[i= i * 5 + 1 + (perturb >>= 5)]
>
> This is also why there is no conversion needed. It's a
> per-key/per-lookup rule.
>
> Frank

Interesting idea, and I see it would avoid conversions.  What happens if 
the data area also removed from the hash?  So you enter 20 colliding 
keys, then 20 more that get randomized, then delete the 18 of the first 
20.  Can you still find the second 20 keys? Takes two sets of probes, 
somehow?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120123/c5437108/attachment.html>


More information about the Python-Dev mailing list