[New-bugs-announce] [issue23712] Experiment: Assume that exact unicode hashes are perfect discriminators

Thu Mar 19 20:57:42 CET 2015

New submission from Raymond Hettinger:

This tracker item is for a thought experiment I'm running where I can collect the thoughts and discussions in one place.  It is not an active proposal for inclusion in Python.

The idea is to greatly speed-up the language for set/dict lookups of unicode value by skipping the exact comparison when the unicode type is exact and the 64-bit hash values are known to match.

Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false positive (which is better than the error rate for non-ECC DRAM itself).  

However, since the siphash isn't cryptographically secure, presumably a malicious chooser of keys could generate a false positive on-purpose.

This technique is currently used by git and mercurial which use hash values for file and version graphs without checking for an exact match (because the chance of a false positive is vanishingly rare).

The Python test suite passes as does the test suites for a number of packages I have installed.

----------
assignee: rhettinger
components: Interpreter Core
files: assume_perf_uni_hash.diff
keywords: patch
messages: 238552
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Experiment:  Assume that exact unicode hashes are perfect discriminators
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file38565/assume_perf_uni_hash.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23712>
_______________________________________