[Python-Dev] Hashes in Python3.5 for tuples and frozensets

Thu May 17 10:15:59 EDT 2018

Chris,
I entirely agree. The same questioner also asked about the fastest data 
type to use as a key in a dictionary; and which data structure is 
fastest. I get the impression the person is very into 
micro-optimization, without profiling their application. It seems every 
choice is made based on the speed of that operation; without 
consideration of how often that operation is used.

On 17/05/18 09:16, Chris Angelico wrote:
> On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev
> <python-dev at python.org> wrote:
>> Victor,
>> Thanks for the link, but to be honest it will just confuse people - neither
>> the link or the related bpo entries state that the fix is only limited to
>> strings. They simply talk about hash randomization - which in my opinion
>> implies ALL hash algorithms; which is why I asked the question.
>>
>> I am not sure how much should be exposed about the scope of security fixes
>> but you can understand my (and other's) confusion.
>>
>> I am aware that applications shouldn't make assumptions about the value of
>> any given hash value - apart from some simple assumptions based hash value
>> equality (i.e. if two objects have different hash values they can't be the
>> same value).
> The hash values of Python objects are calculated by the __hash__
> method, so arbitrary objects can do what they like, including
> degenerate algorithms such as:
>
> class X:
>      def __hash__(self): return 7
Agreed - I should have said the default hash algorithm. Hashes for 
custom object are entirely application dependent.
>
> So it's impossible to randomize ALL hashes at the language level. Only
> str and bytes hashes are randomized, because they're the ones most
> likely to be exploitable - for instance, a web server will receive a
> query like "http://spam.example/target?a=1&b=2&c=3" and provide a
> dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always
> going to create string keys in its dictionaries (JSON objects). Do you
> know of any situation in which an attacker can provide the keys for a
> dict/set as integers?
I was just asking the question - rather than critiquing the fault-fix. I 
am actually more concerned that the documentation relating to the fix 
doesn't make it clear that only strings have their hashes randomised.

>> /B//TW : //
>> //
>> //This question was prompted by a question on a social media platform about
>> the whether hash values are transferable between across platforms.
>> Everything I could find stated that after Python 3.3 ALL hash values were
>> randomized - but that clearly isn't the case; and the original questioner
>> identified that some hash values are randomized and other aren't.//
>> /
> That's actually immaterial. Even if the hashes weren't actually
> randomized, you shouldn't be making assumptions about anything
> specific in the hash, save that *within one Python process*, two equal
> values will have equal hashes (and therefore two objects with unequal
> hashes will not be equal).
Entirely agree - I was just trying to get to the bottom of the 
difference - especially considering that the documentation I could find 
implied that all hash algorithms had been randomized.
>> //I did suggest strongly to the original questioner that relying on the same
>> hash value across different platforms wasn't a clever solution - their
>> original plan was to store hash values in a cross system database to enable
>> quick retrieval of data (!!!). I did remind the OP that a hash value wasn't
>> guaranteed to be unique anyway - and they might come across two different
>> values with the same hash - and no way to distinguish between them if all
>> they have is the hash. Hopefully their revised design will store the key,
>> not the hash./
> Uhh.... if you're using a database, let the database do the work of
> being a database. I don't know what this "cross system database" would
> be implemented in, but if it's a proper multi-user relational database
> engine like PostgreSQL, it's already going to have way better indexing
> than anything you'd do manually. I think there are WAY better
> solutions than worrying about Python's inbuilt hashing.
Agreed
> If you MUST hash your data for sharing and storage, the easiest
> solution is to just use a cryptographic hash straight out of
> hashlib.py.
As stated before - I think the original questioner was intent on micro 
optimizations - and they had hit on the idea that storing an integer 
would be quicker than storing as string - entirely ignoring both the 
practicality of trying to code all strings into a value (since hashes 
aren't guaranteed not to collide), and the issues of trying to reverse 
that translation once the stored key had been retrieved.
> ChrisA
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/anthony.flury%40btinternet.com

Thanks for your comments :-)

-- 
-- 
Anthony Flury
email : *Anthony.flury at btinternet.com*
Twitter : *@TonyFlury <https://twitter.com/TonyFlury/>*