Planning a Python Course for Beginners

Thu Aug 10 05:00:54 EDT 2017

Steven D'Aprano wrote:

> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
> 
>> Good point! A very good __hash__() implementation is:
>> 
>>     def __hash__(self):
>>         return id(self)
>> 
>> In fact, I didn't know Python (kinda) did this by default already. I
>> can't find that information in the definition of object.__hash__():
> 
> 
> Hmmm... using id() as the hash would be a terrible hash function. Objects

It's actually id(self) >> 4 (almost, see C code below), to account for 
memory alignment.

>>> obj = object()
>>> hex(id(obj))
'0x7f1f058070b0'
>>> hex(hash(obj))
'0x7f1f058070b'

>>> sample = (object() for _ in range(10))
>>> all(id(obj) >> 4 == hash(obj) for obj in sample)
True

> would fall into similar buckets if they were created at similar times,
> regardless of their value, rather than being well distributed. 

If that were the problem it wouldn't be solved by the current approach:

>>> sample = [object() for _ in range(10)]
>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
[1, 1, 1, 1, 1, 1, 1, 1, 1]

Py_hash_t
_Py_HashPointer(void *p)
{
    Py_hash_t x;
    size_t y = (size_t)p;
    /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
       excessive hash collisions for dicts and sets */
    y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
    x = (Py_hash_t)y;
    if (x == -1)
        x = -2;
    return x;
}