[issue34751] Hash collisions for tuples

Tim Peters report at bugs.python.org
Sat Sep 29 02:03:26 EDT 2018


Tim Peters <tim at python.org> added the comment:

Jeroen, thanks for helping us fly slightly less blind! ;-) It's a lot of work.

I'd say you may as well pick a prime.  It's folklore, but "a reason" is that you've discovered that regular bit patterns in multipliers can hurt, and sticking to primes eliminates worlds of simple & subtle bit patterns.

Another bit of folklore:  pick a multiplier such that for each byte, neither it nor its complement are close in value to any other byte of the multiplier.  Which may seem more valuable for byte-oriented hashes, yet the byte-oriented FNV violates it spectacularly:  all FNV multipliers have _no_ bits set higher than 2**8 except for their leading bit.  So the longer the FNV multiplier, the more all-0 bytes it contains.

Which appears to be a deliberate choice to limit how quickly each new input byte propagates to other lower-order state bits across iterations.  The DJB33 algorithms accomplish that by using a tiny multiplier (relative to the output hash width) instead.

As I explained in other msgs here, treating tuple component hashes as sequences of bytes seems to deliver excellent results with the unaltered byte-oriented versions of FNV-1[a] and DJBX33[AX] - too good for anything in my test collection to detect anything even suspicious, let alone "a problem" (although it would be easy to contrive problem cases for 33[AX] - 33 is way "too small" to avoid collisions even across tuples with a single component).

So part of our challenge appears to me to be that we're trying instead to produce a hash from inputs of the _same_ bit width.  DJB/FNV were designed to produce hashes of width at least 4 times their inputs' bit widths.  Each input has a relatively tiny effect on the internal state for them.  For us, each input can change the entire internal state.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________


More information about the Python-bugs-list mailing list