[New-bugs-announce] [issue23712] Experiment: Assume that exact unicode hashes are perfect discriminators
Raymond Hettinger
report at bugs.python.org
Thu Mar 19 20:57:42 CET 2015
New submission from Raymond Hettinger:
This tracker item is for a thought experiment I'm running where I can collect the thoughts and discussions in one place. It is not an active proposal for inclusion in Python.
The idea is to greatly speed-up the language for set/dict lookups of unicode value by skipping the exact comparison when the unicode type is exact and the 64-bit hash values are known to match.
Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false positive (which is better than the error rate for non-ECC DRAM itself).
However, since the siphash isn't cryptographically secure, presumably a malicious chooser of keys could generate a false positive on-purpose.
This technique is currently used by git and mercurial which use hash values for file and version graphs without checking for an exact match (because the chance of a false positive is vanishingly rare).
The Python test suite passes as does the test suites for a number of packages I have installed.
----------
assignee: rhettinger
components: Interpreter Core
files: assume_perf_uni_hash.diff
keywords: patch
messages: 238552
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Experiment: Assume that exact unicode hashes are perfect discriminators
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file38565/assume_perf_uni_hash.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23712>
_______________________________________
More information about the New-bugs-announce
mailing list