[issue14621] Hash function is not randomized properly

Mark Dickinson report at bugs.python.org
Wed Nov 7 12:55:11 CET 2012


Mark Dickinson added the comment:

[MAL]
> I don't understand why we are only trying to fix the string problem
> and completely ignore other key types.

[Armin]
> estimating the risks of giving up on a valid query for a truly random
> hash, at an overestimated one billion queries per second ...

That's fine in principle, but if this gets extended to integers, note that our current integer hash is about as far from 'truly random' as you can get:

    Python 3.4.0a0 (default:f02555353544, Nov  4 2012, 11:50:12) 
    [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> [hash(i) for i in range(20)]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Moreover, it's going to be *very* hard to change the int hash while preserving the `x == y implies hash(x) == hash(y)` invariant across all the numeric types (int, float, complex, Decimal, Fraction, 3rd-party types that need to remain compatible).

----------
nosy: +mark.dickinson

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14621>
_______________________________________


More information about the Python-bugs-list mailing list