Any built-in ishashable method ?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Jan 18 07:16:52 EST 2013


On Fri, 18 Jan 2013 12:56:09 +0100, Jean-Michel Pichavant wrote:

>> So I'm guessing you had a key where
>> 
>> key1 == key2 did not imply hash(key1) == hash(key2)
>> 
>> I don't see a way to avoid that problem in a look-before-you-leap test.
>> 
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>> 
>> 
> You guessed right. But it took me a lot of time before jumping to that
> conclusion, mostly because the code worked in the first place (it can
> with a little bit of luck). Now I'm extra careful about what I use as
> dict key, I was just wondering if there was a reliable quick way to
> verify an object can be used as key.

No. You can verify that an object is hashable, by trying to hash it, but 
there is no quick way to verify that the object's hash method isn't buggy.


> That brings me to another question, is there any valid test case where
> key1 != key2 and hash(key1) == hash(key2) ? Or is it some kind of design
> flaw ?

Neither. It is a consequence of the nature of the universe: the 
Pigeonhole Principle. You can't put N+1 pigeons into N pigeonholes 
without at least one pigeonhole containing two pigeons.

Take, for example, ints. There are an infinite number of possible ints in 
Python, or at least limited only by memory. But there are only a finite 
number of values they can be hashed to: hash values are limited to the 
values -2147483648 through 2147483647 in Python 2.7. You can't have an 
infinite set of values hash to a finite number of hashes without there 
being some duplicates. And so we have:

py> hash(1)
1
py> hash(2**32)
1
py> hash(2**64)
1
py> hash(2**128)
1
py> hash(2**2048)
1


etc. Likewise, there are an infinite number of potential strings, which 
must hash into a finite number of hash values. Although it is hard for me 
to find hash collisions, I know that there must be some, because there 
are more strings than hash values. This is not a design flaw in the hash 
function, and can't be fixed by finding a more clever hash function. Nor 
is it a deliberate feature. It's just the way hashing works.



-- 
Steven



More information about the Python-list mailing list