[Python-3000] Performance Notes - new hash algorithm

Nicko van Someren nicko at nicko.org
Thu Sep 13 01:33:07 CEST 2007


On 10 Sep 2007, at 01:58, Jim Jewett wrote:
> To spell this out a bit more:
> ...
> When adding four entries to an 8-slot table, a truly random hash would
> have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4  of the
> time.  As expected, the proposed hash does have a collision for those
> four values (the first and fourth).

While your over-all analysis is both informative and helpful, the  
pedant in me feels obliged to point out the flaw in your math.  The  
probability of at least one collision is 1 minus the probability of  
no collision, which is in turn 8/8 * 7/8 * 6/8 * 5/8, so the correct  
figure is actually that you collide about 59% of the time, not 75%.

(If your math were correct then 5 items would collide 125% of the  
time, which is clearly wrong! :-)

	Cheers,
		Nicko





More information about the Python-3000 mailing list