[issue34751] Hash collisions for tuples

Tim Peters report at bugs.python.org
Tue Oct 2 14:52:29 EDT 2018


Tim Peters <tim at python.org> added the comment:

>> the author wants this transformation to be easily
>> invertible, so a prime is necessary

> A multiplication by any odd number modulo 2**64 is
> invertible.

Right!  My mistake.


> As I argued before, the concept of primes is
> meaningless (except for the prime 2) when computing
> modulo 2**64.

I don't know why they're using a prime.  But that doesn't mean they don't have "a reason".  You seem quick to dismiss things if they don't make instant sense to you at first glance.  I've noted before, e.g., that sticking to a prime eliminates a world of regular bit patterns in the multiplier.

The original SeaHash used a different prime, and a fixed right shift of 32 (but twice in different places).

Switching to the current prime, and the variable shift, can be traced back to this long comment on Reddit:

https://www.reddit.com/r/rust/comments/5fdf8z/seahash_a_blazingly_fast_portable_hash_function/dakdii1/

The prime isn't "random":  it was constructed so that flipping a low-order bit to 1 would directly affect at least one of the topmost 4 bits, which in turn are used to select the shift count.  While that doesn't seem to matter for our tiny test suite, as the message shows it made huge improvements in other regions of the input space.

I haven't reported this, but doing "x ^= x >> 32" instead worked fine for our test suite too.  Well, big deal - we're testing almost nothing beyond two kinds of "oops!" cases (nested tuples, and mixed-sign tiny ints).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________


More information about the Python-bugs-list mailing list