Rich Comparisons Gotcha
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Dec 7 18:51:45 EST 2008
On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
> Rasmus Fogh wrote:
>> Current behaviour is both inconsistent and counterintuitive, as these
>> examples show.
>>
>>>>> x = float('NaN')
>>>>> x == x
>> False
>
> Perhaps this should raise an exception?
Why on earth would you want checking equality on NaN to raise an
exception??? What benefit does it give?
> I think the problem is not with
> comparisons in general but with the fact that nan is type float:
>
> py> type(float('NaN'))
> <type 'float'>
>
> No float can be equal to nan, but nan is a float. How can something be
> not a number and a float at the same time?
Because floats are not real numbers. They are *almost* numbers, they
often (but not always) behave like numbers, but they're actually not
numbers.
The difference is subtle enough that it is easy to forget that floats are
not numbers, but it's easy enough to find examples proving it:
Some perfectly good numbers don't exist as floats:
>>> 2**-10000 == 0.0
True
Try as you might, you can't get the number 0.1 *exactly* as a float:
>>> 0.1
0.10000000000000001
For any numbers x and y not equal to zero, x+y != x. But that fails for
floats:
>>> 1001.0 + 1e99 == 1e99
True
The above is because of overflow. But even avoiding overflow doesn't
solve the problem. With a little effort, you can also find examples of
"ordinary sized" floats where (x+y)-y != x.
>>> 0.9+0.1-0.9 == 0.1
False
>>>>> import numpy
>>>>> y = numpy.zeros((3,))
>>>>> y
>> array([ 0., 0., 0.])
>>>>> bool(y==y)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()
>
> But the equality test is not what fails here. It's the cast to bool that
> fails
And it is right to do so, because it is ambiguous and the library
designers rightly avoided the temptation of guessing what result is
needed.
>>>>> ll1 = [y,1]
>>>>> y in ll1
>> True
>>>>> ll2 = [1,y]
>>>>> y in ll2
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()
>
> I think you could be safe calling this a bug with numpy.
Only in the sense that there are special cases where the array elements
are all true, or all false, and numpy *could* safely return a bool. But
special cases are not special enough to break the rules. Better for the
numpy caller to write this:
a.all() # or any()
instead of:
try:
bool(a)
except ValueError:
a.all()
as they would need to do if numpy sometimes returned a bool and sometimes
raised an exception.
--
Steven
More information about the Python-list
mailing list