[issue34751] Hash collisions for tuples

Raymond Hettinger report at bugs.python.org
Fri Sep 21 15:26:57 EDT 2018


Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

ISTM the collision of "hash((1,0,0)) == hash((1,-2,-2))" also stems from integers hashing to themselves and that twos-complement arithmetic relation, -x == ~x+1.  Further, that example specifically exploits that "hash(-1) == hash(-2)" due to how error codes.  In a way, tuple hashing is incidental.

Occasional collisions aren't really a big deal -- it is normal for dicts to have some collisions and to resolve them.  I would be more concerned is non-contrived datasets had many elements that gave exactly the same hash value resulting in a pile-up in one place.

FWIW, the current algorithm also has some advantages that we don't want to lose. In particular, the combination of the int hash and tuple hash are good at producing distinct values for non-negative numbers:

    >>> from itertools import product
    >>> len(set(map(hash, product(range(100), repeat=4))))
    100000000

The current hash is in the pretty-darned-good category and so we should be disinclined to change battle-tested code that is known to work well across a broad range of use cases.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________


More information about the Python-bugs-list mailing list