[issue34751] Hash collisions for tuples

Tim Peters report at bugs.python.org
Tue Sep 25 15:15:15 EDT 2018


Tim Peters <tim at python.org> added the comment:

>> j is even implies (j ^ -3) == -(j ^ 3)

> This follows from what I posted before: if j is even, then
> j ^ 3 is odd, so we can apply the rule x ^ -2 = -x to x = j ^ 3
> ...

Thanks!  That helps a lot.  I had a blind spot there.  This kind of thing would have been best spelled out at the start:  it's not just that -2 has some bad effects, but that there's a whole universe of potentially damaging identities:

j odd  implies   j      = -(j ^ -2)
j even implies  (j ^ 1) = -(j ^ -1)
j odd  implies  (j ^ 2) = -(j ^ -4)
j even implies  (j ^ 3) = -(j ^ -3)
j odd  implies  (j ^ 4) = -(j ^ -6)
j even implies  (j ^ 5) = -(j ^ -5)
j odd  implies  (j ^ 6) = -(j ^ -8)
j even implies  (j ^ 7) = -(j ^ -7)
j odd  implies  (j ^ 8) = -(j ^ -10)
j even implies  (j ^ 9) = -(j ^ -9)
j odd  implies  (j ^ 10) = -(j ^ -12)
j even implies  (j ^ 11) = -(j ^ -11)
j odd  implies  (j ^ 12) = -(j ^ -14)
j even implies  (j ^ 13) = -(j ^ -13)
j odd  implies  (j ^ 14) = -(j ^ -16)
j even implies  (j ^ 15) = -(j ^ -15)
...

Every integer pairs up with another of the opposite sign in one of these - but with only one.  That goes a long way toward explaining why collisions are frequent when mixing signs on ints of similar magnitude, but also why they don't "pile up" catastrophically.

But it also suggests a way to construct cases that _are_ catastrophic.  For example:

>>> len(set(map(hash, product([-3, 3], repeat=20))))
1024

Ouch!

>>> c = Counter(map(hash, product([-3, 3], repeat=20)))
>>> len(c)
1024
>>> max(c.values())
1024
>>> min(c.values())
1024

So each of the 1024 hash codes showed up 1024 times.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________


More information about the Python-bugs-list mailing list