[issue34751] Hash collisions for tuples

Jeroen Demeyer report at bugs.python.org
Tue Oct 2 08:38:23 EDT 2018


Jeroen Demeyer <J.Demeyer at UGent.be> added the comment:

SeaHash seems to be designed for 64 bits. I'm guessing that replacing the shifts by

x ^= ((x >> 16) >> (x >> 29))

would be what you'd do for a 32-bit hash. Alternatively, we could always compute the hash with 64 bits (using uint64_t) and then truncate at the end if needed.

However, when testing the hash function

    for t in INPUT:
        x ^= hash(t)
        x *= MULTIPLIER
        x ^= ((x >> 16) >> (x >> 29))
        x *= MULTIPLIER

It fails horribly on the original and my new testsuite. I'm guessing that the problem is that the line x ^= ((x >> 16) >> (x >> 29)) ignores low-order bits of x, so it's too close to pure FNV which is known to have problems. When replacing the first line of the loop above by x += hash(t) (DJB-style), it becomes too close to pure DJB and it also fails horribly because of nested tuples.

So it doesn't seem that the line x ^= ((x >> 16) >> (x >> 29)) (which is what makes SeaHash special) really helps much to solve the known problems with DJB or FNV.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________


More information about the Python-bugs-list mailing list