[Python-Dev] Why is nan != nan?

Fri Mar 26 01:57:38 CET 2010

On Thu, 25 Mar 2010 06:26:11 am Mark Dickinson wrote:

> Here's an interesting recent blog post on this subject, from the
> creator of Eiffel:
>
> http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-
>civilization/

Sorry, but he lost me right at the beginning when he quoted someone 
else:

    "there is no reason to believe that the result of one 
    calculation with unclear value should match that of 
    another calculation with unclear value" 

and then argued:

    "The exact same argument can be used to assert that the 
    result should not be False:  

    … there is no reason to believe that the result of one 
    calculation with unclear value should not match that of 
    another calculation with unclear value.

    Just as convincing! Both arguments complement each other: 
    there is no compelling reason for demanding that the 
    values be equal; and there is no compelling argument either 
    to demand that they be different. If you ignore one of the 
    two sides, you are biased."

This whole argument is invalid on at least three levels. I'll get the 
first two out the way briefly

#1: Bertrand starts by treating NANs as "unclear values", and concludes 
that we shouldn't prefer "two unclear values are different" as more 
compelling than "two unclear values are the same". But this is 
ridiculous -- if you ask me a pair of questions, and I answer "I'm not 
sure" to both of them, why would you assume that the right answer to 
both questions is actually the same?

#2: But in fact NANs aren't "unclear values", they are not values at 
all. The answer to "what is the non-complex logarithm of -1?" is 
not "I'm not sure" but "there is no such value". Bertrand spends an 
awful lot of time trying to demonstrate why the reflexivity of equality 
(every x is equal to itself) should apply to NANs as well as the other 
floats, but RoE is a property of equivalence relations, which does not 
(and should not) hold for "there is no such value".

By analogy: the Lizard King of Russia does not exist; the Vampire Queen 
of New Orleans also does not exist. We don't therefore conclude that 
the Lizard King and the Vampire Queen are therefore the same person.

#3: We could, if we wish, violate the IEEE standard and treat equality 
of NANs as an equivalence relation. It's our language, we're free to 
follow whatever standards we like, and reflexivity of equality is a 
very useful axiom to have. Since it applies to all non-NAN floats (and 
virtually every object in Python, other than those with funny __eq__ 
methods), perhaps we should extend it to NANs as well?

I hope to convince you that the cost of doing so is worse than the 
disease. Since NANs are usually found in mathematical contexts, we 
should follow the IEEE standard even at the cost of rare anomalies in 
non-mathematical code containing NANs.

Simply put: we should treat "two unclear values are different" as more 
compelling than "two unclear values are the same" as it leads to fewer, 
smaller, errors. Consider:

log(-1) = NAN  # maths equality, not assignment
log(-2) = NAN

If we allow NAN = NAN, then we permit the error:

log(-1) = NAN = log(-2)
therefore log(-1) = log(-2)
and 1 = 2

But if make NAN != NAN, then we get:

log(-1) != log(-2)

and all of mathematics does not collapse into a pile of rubble. I think 
that is a fairly compelling reason to prefer inequality over equality.

One objection might be that while log(-1) and log(-2) should be 
considered different NANs, surely NANs should be equal to themselves?

-1 = -1
implies log(-1) = log(-1)

But consider the practicalities: there are far more floats than 
available NAN payloads. We simply can't map every invalid calculation 
to a unique NAN, and therefore there *must* be cases like:

log(-123.456789e-8) = log(-9.876e47)
implies 123.456789e-8 = 9.876e47

So we mustn't consider NANs equal just because their payloads are equal.

What about identity? Even if we don't dare allow this:

x = log(-1)  # assignment
y = log(-1)  # another NAN with the same payload
assert x is not y
assert x == y

surely we can allow this?

assert x == x

But this is dangerous. Don't be fooled by the simplicity of the above 
example. Just because you have two references to the same (as in 
identity) NAN, doesn't mean they represent "the same thing" or came 
from the same place:

data = [1, 2, float('nan'), float('nan'), 3]
x = harmonic_mean(data)
y = 1 - geometric_mean(data)

It is an accident of implementation whether x and y happen to be the 
same object or not. Why should their inequality depend on such a 
fragile thing?

In fact, identity of NANs is itself an implementation quirk of 
programming languages like Python: logically, NANs don't have identity 
at all.

To put it another way: all ONEs are the same ONE, even if they come from 
different sources, are in different memory locations, or have different 
identities; but all NANs are different, even if they come from the same 
source, are in the same memory location, or have the same identity.

The fundamental problem here is that NANs are not values. If you treat 
them as if they were values, then you want reflexivity of equality. But 
they're not -- they're *signals* for "your calculation has gone screwy 
and the result you get is garbage", so to speak. You shouldn't even 
think of a specific NAN as a piece of specific garbage, but merely a 
label on the *kind* of garbage you've got (the payload): INF-INF is, in 
some sense, a different kind of error to log(-1).

In the same way you might say "INF-INF could be any number at all, 
therefore we return NAN", you might say "since INF-INF could be 
anything, there's no reason to think that INF-INF == INF-INF."

-- 
Steven D'Aprano