dictionary keys, __hash__, __cmp__

Miika Keskinen miika.keskinen at utu.fi
Wed Nov 5 02:28:13 EST 2003


On Tue, 04 Nov 2003 21:52:42 +0100, Jan-Erik Meyer-Lütgens wrote:

> In the Python Language Reference, I found the following statements about
> using objects as dictionary keys:
> 
>     1. "__hash__() should return a 32-bit integer."
> 
>     2. "The only required property is that objects which
>         compare equal have the same hash value."
> 
>     3. "If a class does not define a __cmp__() method it
>         should not define a __hash__() operation either."
> 
> 
> Can I asume that:
> 
>   -- it is guaranteed that id(obj) returns a unique 32-bit integer

Yes it is - id returns current address of object in memory.

**SNIP**
Help on built-in function id:

id(...)
    id(object) -> integer
    
    Return the identity of an object.  This is guaranteed to be unique
    among simultaneously existing objects.  (Hint: it's the object's
    memory address.)

**SNIP**

>   -- keys are interchangeable (equivalent),
>      if the following is valid:
> 
>          hash(key1) == hash(key2) and key1 == key2

Yes. note that key1 == key2 implies hash(key1) == hash(key2), but if
hash(key1) == hash(key2) there is no guarantee that key1 == key2. If you
do define __hash__() build-in function hash would use it and thus you need
to be sure about quality of your hash-method. However if you do not define
__hash__ hash() will just fall back into id() that is guaranteed to be
unique.


>   -- I can ignore the 2nd statement, if I am aware of
>      the fact that: if objects are equal it dosn't mean that they are
>      the same key.

So you're introducing scenario where different objects are considered
equal  in means of __cmp__ while having different hash. I think that's not
normal.

Since if you do not define __cmp__ / __hash__ python will use id and thus
second rule is valid.

If you define cmp but not hash you will most likely get TypeError. If you
define both you should follow second rule as if I'm right most of the
internal data structures will depend on second rule.


>   -- I can savely ignore the 3rd statement, because python
>      falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.

Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
__hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
NOT in third rule. I'm sure there is situations where it's perfectly
normal to use id-value's and custom hashes. Anyways you can redefine
__cmp__() simply to (and thus avoiding to break against third rule):

def __cmp__(self, other):
	return id(self).__cmp__(id(other))



-- 
Miika




More information about the Python-list mailing list