[issue23712] Experiment: Assume that exact unicode hashes are perfect discriminators

Josh Rosenberg report at bugs.python.org
Fri Mar 20 01:47:47 CET 2015


Josh Rosenberg added the comment:

Assuming Siphash is in fact cryptographically secure (in the sense you can't choose a desired result hash with better odds that brute force), and it appears to be so, with a keyspace of 64 bits, if it's evenly distributed (which a cryptographically secure hash should be), that implies to even have a 1% chance of any two keys in a set colliding, you'd need over 2**29 keys ( you can plug in your own numbers for the conditional calculation here https://lazycackle.com/Probability_of_repeated_event_online_calculator__birthday_problem_.html ). Even at the one ten thousandth of a percent collision threshold, you're talking about a set of 6 million strings to find even one pair that match.

I'd still be leery of using such an approach for general purpose sets and dicts, where they could conceivably contain enough entries to pose a risk (vanishingly small, but not "heat death of the universe" small). But for Python implementation dictionaries (module, nested scope, class, and instance dictionaries), where we're talking about maybe a thousand attributes in extreme cases, which are almost never under the control of an "attacker" in any event, I could definitely see a low risk win. If you're assuming a dictionary with less than 10,000 keys, that's a hit would be literally one in a trillion; under a hundred and you're below one in a quadrillion chance, which I think is safe enough. If you wanted to make it "safe" you could conceivably use an approach that changed algorithms up front, depending on the size of the dictionary; less than a hundred entries, use hash only lookup, above 100, use "safe" lookup.

----------
nosy: +josh.r

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23712>
_______________________________________


More information about the Python-bugs-list mailing list