[issue34751] Hash collisions for tuples

Wed Sep 26 17:23:15 EDT 2018

Tim Peters <tim at python.org> added the comment:

>> The two-liner above with the xor in the second line is
>> exactly Bernstein 33A, followed by a permutation
>> of 33A's _output_ space.

> Not output space, but internal state

?  33A's output _is_ its internal state at the end.  This is a distinction that makes no difference.  I do distinguish here between 33A and the output of Python's `tuplehash()`, but the output space of both on a 64-bit box is a 64-bit int, so that's another pointless distinction to me.

> (I assume that you do that operation inside the loop).

Yes, as I said at the start, "this is the only code remaining in the loop apart from setting y to the next tuple component's hash".

> It's replacing DJBX33A by a different algorithm which is not
> DJBX33A.

Replacing DJBX33A's multiplier of 33 is also a different algorithm.  So is working with inputs other than unsigned bytes.  So is mucking with the inputs before adding them in.

> It may or may not work, that's not my point. It's just that
> I would avoid changing the structure of the algorithm if
> there is no reason to.

Which is also fine by me, except I see no actual reason to care.  All variations of "chain permutations" I've tried appear to work fine, except for those (which I haven't mentioned at all) that tried replacing multiplication with "weaker" permutations.

A minor exception is that, as already mentioned, applying the "leftshift-xor" permutation to the inputs with a shift count of 1 didn't pass the original tuple hash test in several cases.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________