Planning a Python Course for Beginners

Thu Aug 10 12:58:17 EDT 2017

On Fri, Aug 11, 2017 at 2:41 AM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> On Fri, 11 Aug 2017 12:58 am, Chris Angelico wrote:
>
>> On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano
>> <steve+python at pearwood.info> wrote:
>>
>>> The C code says:
>>>
>>>>     /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>>>        excessive hash collisions for dicts and sets */
>>>
>>> which I think agrees with my comment: using the id() itself would put too
>>> many objects in the same bucket (i.e. too many collisions).
>>>
>>>
>>>> If that were the problem it wouldn't be solved by the current approach:
>>>>
>>>>>>> sample = [object() for _ in range(10)]
>>>>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
>>>> [1, 1, 1, 1, 1, 1, 1, 1, 1]
>>
>> A difference of 1 in a hash is usually going to mean dropping
>> something into the next bucket. A difference of 4, 8, or 16 would mean
>> that a tiny dictionary (which has 8 slots and thus uses modulo-8)
>> would have everything on the same slot.
>
> Um... yes? And how does that relate to the comment given in the source code?
>
> "bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash
> collisions for dicts and sets"
> Are we in agreement so far?

Yes, we're in agreement. It may have been unclear from my quoting
style, but the main point I was disagreeing with was this:

>>>> If that were the problem it wouldn't be solved by the current approach:
>>>>
>>>>>>> sample = [object() for _ in range(10)]
>>>>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
>>>> [1, 1, 1, 1, 1, 1, 1, 1, 1]

Incrementing hashes by 1 usually will put things into successive
buckets. Incrementing by 8 or 16 will usually put things into the same
bucket.

Sorry for the confusion.

ChrisA