Rich Comparisons Gotcha

Sun Dec 7 18:51:45 EST 2008

On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

> Rasmus Fogh wrote:
>> Current behaviour is both inconsistent and counterintuitive, as these
>> examples show.
>> 
>>>>> x = float('NaN')
>>>>> x == x
>> False
> 
> Perhaps this should raise an exception?

Why on earth would you want checking equality on NaN to raise an 
exception??? What benefit does it give?

> I think the problem is not with
> comparisons in general but with the fact that nan is type float:
> 
> py> type(float('NaN'))
> <type 'float'>
> 
> No float can be equal to nan, but nan is a float. How can something be
> not a number and a float at the same time? 

Because floats are not real numbers. They are *almost* numbers, they 
often (but not always) behave like numbers, but they're actually not 
numbers.

The difference is subtle enough that it is easy to forget that floats are 
not numbers, but it's easy enough to find examples proving it:

Some perfectly good numbers don't exist as floats:

>>> 2**-10000 == 0.0
True

Try as you might, you can't get the number 0.1 *exactly* as a float:

>>> 0.1
0.10000000000000001

For any numbers x and y not equal to zero, x+y != x. But that fails for 
floats:

>>> 1001.0 + 1e99 == 1e99
True

The above is because of overflow. But even avoiding overflow doesn't 
solve the problem. With a little effort, you can also find examples of 
"ordinary sized" floats where (x+y)-y != x.

>>> 0.9+0.1-0.9 == 0.1
False

>>>>> import numpy
>>>>> y = numpy.zeros((3,))
>>>>> y
>> array([ 0.,  0.,  0.])
>>>>> bool(y==y)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()
> 
> But the equality test is not what fails here. It's the cast to bool that
> fails

And it is right to do so, because it is ambiguous and the library 
designers rightly avoided the temptation of guessing what result is 
needed.

>>>>> ll1 = [y,1]
>>>>> y in ll1
>> True
>>>>> ll2 = [1,y]
>>>>> y in ll2
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()
> 
> I think you could be safe calling this a bug with numpy. 

Only in the sense that there are special cases where the array elements 
are all true, or all false, and numpy *could* safely return a bool. But 
special cases are not special enough to break the rules. Better for the 
numpy caller to write this:

a.all() # or any()

instead of:

try:
    bool(a)
except ValueError:
    a.all()

as they would need to do if numpy sometimes returned a bool and sometimes 
raised an exception.

-- 
Steven