Planning a Python Course for Beginners

Marko Rauhamaa marko at pacujo.net
Thu Aug 10 09:31:56 EDT 2017


Peter Otten <__peter__ at web.de>:
> Steven D'Aprano wrote:
>> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
>> 
>>> Good point! A very good __hash__() implementation is:
>>> 
>>>     def __hash__(self):
>>>         return id(self)
>>> 
>>> In fact, I didn't know Python (kinda) did this by default already. I
>>> can't find that information in the definition of object.__hash__():
>> 
>> 
>> Hmmm... using id() as the hash would be a terrible hash function.

id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id().

> It's actually id(self) >> 4 (almost, see C code below), to account for
> memory alignment.

Memory alignment makes no practical difference. It it is any good, the
internal implementation will further scramble and scale the returned
hash value. For example:

    index = hash(obj) % prime_table_size

>> would fall into similar buckets if they were created at similar
>> times, regardless of their value, rather than being well distributed.
>
> If that were the problem it wouldn't be solved by the current approach:

It is not a problem. Hash values don't need to be well distributed, they
simply need to be discerning to tiny differences in equality.

>>>> sample = [object() for _ in range(10)]
>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
> [1, 1, 1, 1, 1, 1, 1, 1, 1]

Nice demo :-)


Marko



More information about the Python-list mailing list