Planning a Python Course for Beginners

Peter Otten __peter__ at web.de
Thu Aug 10 16:35:00 EDT 2017


Marko Rauhamaa wrote:

> Peter Otten <__peter__ at web.de>:
> 
>> Steve D'Aprano wrote:
>>> The C code says:
>>> 
>>>>/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>>>excessive hash collisions for dicts and sets */
>>> 
>>> which I think agrees with my comment: using the id() itself would put
>>> too many objects in the same bucket (i.e. too many collisions).
>>
>> There's a subtle diffence: you expected objects with nearby memory
>> addresses to end up in the same "bucket" while actually all addresses
>> (are likely to) have the same low bits, and creation time does not
>> matter.
> 
> I see no point in CPython's rotation magic.

Let's see:

$ cat hashperf.py
class A(object):
    __slots__ = ["_hash"]

    def __hash__(self):
        return self._hash

def no_magic():
    a = A()
    a._hash = id(a)
    return a

def magic():
    a = A()
    a._hash = id(a) >> 4
    return a

$ python3 -m timeit -s 'from hashperf import magic, no_magic; s = 
{no_magic() for _ in range(10**5)}' 'for x in s: x in s'
10 loops, best of 3: 70.7 msec per loop

$ python3 -m timeit -s 'from hashperf import magic, no_magic; s = {magic() 
for _ in range(10**5)}' 'for x in s: x in s'
10 loops, best of 3: 52.8 msec per loop

"magic" wins this makeshift test. Other than that you're right ;)





More information about the Python-list mailing list