[Python-Dev] Hashes in Python3.5 for tuples and frozensets

Chris Angelico rosuav at gmail.com
Thu May 17 10:34:50 EDT 2018


On Fri, May 18, 2018 at 12:15 AM, Anthony Flury via Python-Dev
<python-dev at python.org> wrote:
> Chris,
> I entirely agree. The same questioner also asked about the fastest data type
> to use as a key in a dictionary; and which data structure is fastest. I get
> the impression the person is very into micro-optimization, without profiling
> their application. It seems every choice is made based on the speed of that
> operation; without consideration of how often that operation is used.

Sounds like we're on the same page here.

> On 17/05/18 09:16, Chris Angelico wrote:
>> The hash values of Python objects are calculated by the __hash__
>> method, so arbitrary objects can do what they like, including
>> degenerate algorithms such as:
>>
>> class X:
>>      def __hash__(self): return 7
>
> Agreed - I should have said the default hash algorithm. Hashes for custom
> object are entirely application dependent.

There isn't a single "default hash algorithm"; in fact, I'm not sure
that there's even a single algorithm used for all strings. Certainly
the algorithm used for integers is completely different from the
one(s) used for strings; we have a guarantee that ints and floats
representing the same real number are going to have the same hash
(even if that hash isn't equal to the number -
hash(1e22)==hash(10**22)!=10**22 is True), since they compare equal.
The algorithms used and the resulting hashes may change between Python
versions, when you change interpreters (PyPy vs Jython vs CPython vs
Brython...), or even when you change word sizes, I believe (32-bit vs
64-bit).

So, this is (a) a premature optimization, (b) depending on something
that's not guaranteed, and (c) is a great way to paint yourself into a
corner. Perfect! :)

ChrisA


More information about the Python-Dev mailing list