[Python-Dev] Why not using the hash when comparing strings?

Fri Oct 19 12:02:19 CEST 2012

Hrvoje Niksic <hrvoje.niksic at avl.com> wrote:

> On 10/19/2012 03:22 AM, Benjamin Peterson wrote:
>> It would be interesting to see how common it is for strings which have
>> their hash computed to be compared.
> 
> Since all identifier-like strings mentioned in Python are interned, and 
> therefore have had their hash computed, I would imagine comparing them 
> to be fairly common. After all, strings are often used as makeshift 
> enums in Python.
> 
> On the flip side, those strings are typically small, so a measurable 
> overall speed improvement brought by such a change seems unlikely.

I'm pretty sure it would result in a small slowdown.

Many (most?) of the comparisons against interned identifiers will be done 
by dictionary lookups and the dictionary lookup code only tries the string 
comparison after it has determined that the hashes match. The only time 
dictionary key strings contents are actually compared is when the hash 
matches but the pointers are different; it is already the case that if the 
hashes don't match the strings are never compared.