Planning a Python Course for Beginners

Chris Angelico rosuav at gmail.com
Thu Aug 10 10:58:29 EDT 2017


On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano
<steve+python at pearwood.info> wrote:

> The C code says:
>
>>     /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>        excessive hash collisions for dicts and sets */
>
> which I think agrees with my comment: using the id() itself would put too many
> objects in the same bucket (i.e. too many collisions).
>
>
>> If that were the problem it wouldn't be solved by the current approach:
>>
>>>>> sample = [object() for _ in range(10)]
>>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
>> [1, 1, 1, 1, 1, 1, 1, 1, 1]

A difference of 1 in a hash is usually going to mean dropping
something into the next bucket. A difference of 4, 8, or 16 would mean
that a tiny dictionary (which has 8 slots and thus uses modulo-8)
would have everything on the same slot.

>
> So my money is on object() being anomalous: because it is so small, the hashes
> end up so similar. For "typical" classes, the hash function does a much better
> job of mixing the hash values up.
>

And this is also possible, but most likely the difference would simply
widen; small dictionaries would still have all objects landing in the
zeroth bucket. Hence rotating away the low bits.

ChrisA



More information about the Python-list mailing list