[issue10408] Denser dicts and linear probing

Sat Dec 10 12:29:00 CET 2011

Charles-François Natali <neologix at free.fr> added the comment:

This paper might be of interest:
http://www.siam.org/meetings/alenex05/papers/13gheileman.pdf

Basically, it concludes that most of the time, there's no speedup to be gained from the increased cached locality incurred by linear probing when the input sample differ from a uniform distribution. However, figure 1 shows a clear win with a uniform distribution, and figure 2 shows that even with a non-uniform distribution, there's still a gain as the load factor increases.
So if experiments show measurable improvements, this might be an interesting idea.
However, the problem with linear probing is that the performance will be much more dependent on the input distribution that with quadratic probing/double hashing, so this will definitely degrade with some workloads.

> The problem with linear probing, as noted in the comments, is that it 
> degrades performance quite a lot when the hashing function clusters 
> results. The solution I'm proposing is to apply an *initial* 
> perturbation, by multiplying the hash() with a prime number. 

I fail to see how this would avoid primary clustering: IIUC, you replace the hash function h(k) by h'(k), but you still use linear probing: an empty slot preceded by i filled slots has a probablility (i+1)/N of getting filled next (N being the total number of slots).

----------
nosy: +neologix

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10408>
_______________________________________