[issue30671] dict: simplify and improve lookup function

Tim Peters report at bugs.python.org
Thu Jun 15 23:21:03 EDT 2017


Tim Peters added the comment:

Whatever the iteration, accept that it's necessary it reach every value in range(2**i) eventually.  The current scheme does that, for reasons already documented.  You seem to be overlooking the importance of this part:

"""
Note that because perturb is unsigned, if the recurrence is executed often enough perturb eventually becomes and remains 0.  At that point (very rarely reached) the recurrence is on (just) 5*j+1 again, and that's certain to find an empty slot eventually (since it generates every int in range(2**i), and we make sure there's always at least one empty slot).
"""

5 is the smallest multiplier (besides the degenerate 1) for which that's true for every power-of-2 modulus.  It doesn't matter how rare collisions are if you switch to a scheme for which it's _not_ always true:  then you risk an infinite loop failing to find an empty slot.

Also note that testing was not just done on "random" cases, but "on both normal and pathological cases".  For example, integer keys all of which have lots of trailing zero bits (there's a specific example of that too in the comments).  The smaller PERTURB_SHIFT, the longer it takes to get the high-order bits into play - and the longer it takes to fall back to a pure "5*j+1" iteration, which is the part that's necessary for _correctness_.

----------
nosy: +tim.peters

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30671>
_______________________________________


More information about the Python-bugs-list mailing list