[issue34751] Hash collisions for tuples
Tim Peters
report at bugs.python.org
Tue Sep 25 15:15:15 EDT 2018
Tim Peters <tim at python.org> added the comment:
>> j is even implies (j ^ -3) == -(j ^ 3)
> This follows from what I posted before: if j is even, then
> j ^ 3 is odd, so we can apply the rule x ^ -2 = -x to x = j ^ 3
> ...
Thanks! That helps a lot. I had a blind spot there. This kind of thing would have been best spelled out at the start: it's not just that -2 has some bad effects, but that there's a whole universe of potentially damaging identities:
j odd implies j = -(j ^ -2)
j even implies (j ^ 1) = -(j ^ -1)
j odd implies (j ^ 2) = -(j ^ -4)
j even implies (j ^ 3) = -(j ^ -3)
j odd implies (j ^ 4) = -(j ^ -6)
j even implies (j ^ 5) = -(j ^ -5)
j odd implies (j ^ 6) = -(j ^ -8)
j even implies (j ^ 7) = -(j ^ -7)
j odd implies (j ^ 8) = -(j ^ -10)
j even implies (j ^ 9) = -(j ^ -9)
j odd implies (j ^ 10) = -(j ^ -12)
j even implies (j ^ 11) = -(j ^ -11)
j odd implies (j ^ 12) = -(j ^ -14)
j even implies (j ^ 13) = -(j ^ -13)
j odd implies (j ^ 14) = -(j ^ -16)
j even implies (j ^ 15) = -(j ^ -15)
...
Every integer pairs up with another of the opposite sign in one of these - but with only one. That goes a long way toward explaining why collisions are frequent when mixing signs on ints of similar magnitude, but also why they don't "pile up" catastrophically.
But it also suggests a way to construct cases that _are_ catastrophic. For example:
>>> len(set(map(hash, product([-3, 3], repeat=20))))
1024
Ouch!
>>> c = Counter(map(hash, product([-3, 3], repeat=20)))
>>> len(c)
1024
>>> max(c.values())
1024
>>> min(c.values())
1024
So each of the 1024 hash codes showed up 1024 times.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
More information about the Python-bugs-list
mailing list