Planning a Python Course for Beginners

Chris Angelico rosuav at gmail.com
Thu Aug 10 17:31:02 EDT 2017


On Fri, Aug 11, 2017 at 7:17 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> That's interesting, but suggests there's something weird (~ suboptimal)
> going on with CPython's scrambling algorithm. Also, your performance
> test might yield completely different results on other Python
> implementations.
>
> Apart from uniqueness, there's no particular reason to prefer one
> __hash__() value over another as long as the interesting bits are inside
> the CPU's simple integer range.

Not true. Every time you probe a new location [1], you have to fetch
more data from RAM. That delays you significantly. An ideal hashing
system is going to give a high probability of giving you an empty
bucket on the first try, to minimize the number of main memory
accesses required.

CPython's scrambling algorithm means that, even when its first try
doesn't succeed, there's a good chance that its second will succeed.
But that doesn't change the fact that you want the first one to
succeed as often as possible.

ChrisA

[1] Unless it's within the same cache line, which is eight pointers
wide on my CPU. Highly unlikely when working with 100,000 pointers.



More information about the Python-list mailing list