booleans, hash() and objects having the same value

Wed Jan 30 21:25:52 EST 2008

On Wed, 30 Jan 2008 17:14:32 -0800, Ryszard Szopa wrote:

> Hi all,
> 
> I've just read PEP 285 so I understand why bool inherits from int and
> why, for example,  ((False - True)*True)**False==1.

And don't think that the choice was uncontroversial.

> This was necessary for backwards compatibility 

"Necessary" is perhaps a little strong, but otherwise yes.

> and to give the beast some ability to do moral reasoning. 
> For example, Python knows to value the whole truth more
> than just a half-truth:
> 
> In [95]: True > 0.5*True
> Out[95]: True

You're trying to be funny, yes?

> Anyway, the thing that bothers me is the behavior of booleans when
> passed as argument to the hash() function... That is, hash(True) ==
> hash(1) and hash(False) == hash(0). 

How do you feel about this?

>>> hash(1.0) == hash(1)
True
>>> hash(0.0) == hash(0)
True
>>> hash(9907.0) == hash(9907)
True

It's the same thing: True is actually 1, just as 1.0 is, and so 
hash(True) is the same as hash(1), hash(1.0) and hash(1L).

> This leads to a rather
> counterintuitive interaction with dicts:

[...]

> Out[128]: {True: '1'}

Yes, that's one of the disadvantages of having bools actually be ints, in 
the rare cases that you want bools to hash differently from ints, they 
don't. But that's no different from longs and ints hashing the same, or 
strings and unicode strings.

> You may argue that this is a rather strange use case... However, you may
> imagine that somebody would want a dict mapping from objects to their
> representations, with 0, 1 and booleans among the objects, like in:
> 
> In [123]: dict((el, repr(el)) for el in [0, 1, True, False]) Out[123]:
> {0: 'False', 1: 'True'}

Why bother with such a mapping? It already exists, and it is called 
repr(). Best of all, repr() shouldn't give a KeyError, and it can take 
mutable arguments.

> In both cases, the result is rather unexpected, though after some
> thinking, understandable (`==' tests the equality of values of objects,
> True==1, and (from the documentation of hash) "Two objects with the same
> value have the same hash value"). However, is this approach really
> sound? 

Absolutely. As a general data type, the most sensible behaviour for hash 
tables is for dict[X] and dict[Y] to give the same result if X and Y are 
equal.

> Wouldn't it be more sensible to have bool its own __hash__?

Who cares what bools hash to? The real question is, should True be equal 
to 1 (and 1.0 and 1L) or not?

The decision that it should was made a long time ago. It may or may not 
have been the best decision, but it's a decision and I doubt that it will 
be changed before Python 4000. Or possibly Python 5000.

> PEP 285 doesn't mention anything about hashing (in fact, it doesn't
> contain the string `hash' at all). Is it that nobody has noticed the
> problem, it is a well known fact usually classified as a non-problem, or
> maybe there are some serious reasons to keep 1 and True having the same
> hash value?

It's a non-problem in general. There might be highly specialized 
situations where you want 1.0 and 1 to map to different items, or 'xyz' 
and u'xyz', but being specialist they belong in your application code and 
not the language.

Here's a start in implementing such a thing:

class MyDict(dict):
    def __getitem__(self, key):
        key = (type(key), key)
        return super(MyDict, self).__getitem__(key)
    def __setitem__(self, key, value):
        key = (type(key), key)
        super(MyDict, self).__setitem__(key, value)

>>> D = MyDict(); D[1] = "one"; D[1.0] = "one point oh"
>>> D[1L] = "long one"; D[True] = "True"
>>> D[1]
'one'
>>> D[True]
'True'

(I leave implementing the other necessary methods as an exercise.)

> (Also, it is not completely clear what it means for two Python objects
> to "have the same value". My first intuition would be that variables may
> have a value, which usually is some Python object. 

I think that value should be interpreted rather fuzzily. I don't believe 
it is strongly defined: the concept of the value of an object depends on 
whatever the object wants it to be. For example, given an instance x with 
an attribute "foo", is x.foo part of the value of x, or is it something 
extra? Only the object x can make that decision.

However, for standard objects like strings, ints, floats, etc. the value 
of the object corresponds to the intuitive ideas about strings, ints, 
floats etc. The value of the int 5 is 5, the value of the string "xyz" is 
"xyz", and so forth.

For "well-behaved" objects, x and y have the same value when x == y 
returns True. Leave it to the objects to decide what their value is.

It's easy to create ill-behaved objects:

class Weird:
    def __eq__(self, other):
        if other is self:
            return False
        elif other is True:
            return True
        elif other == 1:
            return False
        else:
            import time
            return int(time.time()) % 2 == 0

but in general, you don't need to worry about such nonsense objects.

> The second intuition
> would be that objects with compatible (i.e. one inherits from the other)
> types and ==-equal dicts have the same value. However, this is
> _sometimes_ true. 

Python rarely cares about the type of objects, only the behaviour. 
Inheritance doesn't come into it, except as one possible way to get that 
behaviour:

class MyInt:  # DON'T inherit from int
    def __init__(self, value):
        if value == 'one': a, b = 0, 1
        elif value == 'two': a, b = 1, 1
        elif value == 'three': a, b = 1, 2
        else:
             raise ValueError("can't count that high")
        self.data = (a, b)
    def __eq__(self, other):
        return other == sum(self.data)

Instances of MyInt have the same value as the ints 1, 2, or 3 as 
appropriate. In all other ways though, MyInt and int behave very 
differently: for example, you can't add MyInts.

-- 
Steven