[issue5186] Reduce hash collisions for objects with no __hash__ method

Mark Dickinson report at bugs.python.org
Wed Feb 11 11:53:25 CET 2009


Mark Dickinson <dickinsm at gmail.com> added the comment:

[Adam Olsen]
> The alignment requirements (long double) make it impossible to have
> anything in those bits.

Not necessarily, since not all pointers passed to _Py_HashPointer come
from a PyObject_Malloc.  _Py_HashPointer is used for function pointers
as well.  For example, on 64-bit linux I get:

Python 2.7a0 (trunk:69516, Feb 11 2009, 10:43:51)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from hashlib import sha224
>>> hash(sha224)
47100514454970
>>> hash(sha224) % 16
10

> for that you'd need a dictionary with at least 2 billion entries
> on 32bit,

<nitpick> If I recall correctly, the higher bits of the hash value also
get used in collision resolution: they're mixed in to the algorithm
that's used to produce the probing sequence when looking for an empty
slot.  So the dictionary wouldn't necessarily have to be quite that
large for the top bits to come into play. </nitpick>

But I agree that mixing the bottom bits back in (via rotate, or xor, or
whatever) doesn't seem likely to help.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5186>
_______________________________________


More information about the Python-bugs-list mailing list