[Python-Dev] Why is nan != nan?
Steven D'Aprano
steve at pearwood.info
Fri Mar 26 01:57:38 CET 2010
On Thu, 25 Mar 2010 06:26:11 am Mark Dickinson wrote:
> Here's an interesting recent blog post on this subject, from the
> creator of Eiffel:
>
> http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-
>civilization/
Sorry, but he lost me right at the beginning when he quoted someone
else:
"there is no reason to believe that the result of one
calculation with unclear value should match that of
another calculation with unclear value"
and then argued:
"The exact same argument can be used to assert that the
result should not be False:
… there is no reason to believe that the result of one
calculation with unclear value should not match that of
another calculation with unclear value.
Just as convincing! Both arguments complement each other:
there is no compelling reason for demanding that the
values be equal; and there is no compelling argument either
to demand that they be different. If you ignore one of the
two sides, you are biased."
This whole argument is invalid on at least three levels. I'll get the
first two out the way briefly
#1: Bertrand starts by treating NANs as "unclear values", and concludes
that we shouldn't prefer "two unclear values are different" as more
compelling than "two unclear values are the same". But this is
ridiculous -- if you ask me a pair of questions, and I answer "I'm not
sure" to both of them, why would you assume that the right answer to
both questions is actually the same?
#2: But in fact NANs aren't "unclear values", they are not values at
all. The answer to "what is the non-complex logarithm of -1?" is
not "I'm not sure" but "there is no such value". Bertrand spends an
awful lot of time trying to demonstrate why the reflexivity of equality
(every x is equal to itself) should apply to NANs as well as the other
floats, but RoE is a property of equivalence relations, which does not
(and should not) hold for "there is no such value".
By analogy: the Lizard King of Russia does not exist; the Vampire Queen
of New Orleans also does not exist. We don't therefore conclude that
the Lizard King and the Vampire Queen are therefore the same person.
#3: We could, if we wish, violate the IEEE standard and treat equality
of NANs as an equivalence relation. It's our language, we're free to
follow whatever standards we like, and reflexivity of equality is a
very useful axiom to have. Since it applies to all non-NAN floats (and
virtually every object in Python, other than those with funny __eq__
methods), perhaps we should extend it to NANs as well?
I hope to convince you that the cost of doing so is worse than the
disease. Since NANs are usually found in mathematical contexts, we
should follow the IEEE standard even at the cost of rare anomalies in
non-mathematical code containing NANs.
Simply put: we should treat "two unclear values are different" as more
compelling than "two unclear values are the same" as it leads to fewer,
smaller, errors. Consider:
log(-1) = NAN # maths equality, not assignment
log(-2) = NAN
If we allow NAN = NAN, then we permit the error:
log(-1) = NAN = log(-2)
therefore log(-1) = log(-2)
and 1 = 2
But if make NAN != NAN, then we get:
log(-1) != log(-2)
and all of mathematics does not collapse into a pile of rubble. I think
that is a fairly compelling reason to prefer inequality over equality.
One objection might be that while log(-1) and log(-2) should be
considered different NANs, surely NANs should be equal to themselves?
-1 = -1
implies log(-1) = log(-1)
But consider the practicalities: there are far more floats than
available NAN payloads. We simply can't map every invalid calculation
to a unique NAN, and therefore there *must* be cases like:
log(-123.456789e-8) = log(-9.876e47)
implies 123.456789e-8 = 9.876e47
So we mustn't consider NANs equal just because their payloads are equal.
What about identity? Even if we don't dare allow this:
x = log(-1) # assignment
y = log(-1) # another NAN with the same payload
assert x is not y
assert x == y
surely we can allow this?
assert x == x
But this is dangerous. Don't be fooled by the simplicity of the above
example. Just because you have two references to the same (as in
identity) NAN, doesn't mean they represent "the same thing" or came
from the same place:
data = [1, 2, float('nan'), float('nan'), 3]
x = harmonic_mean(data)
y = 1 - geometric_mean(data)
It is an accident of implementation whether x and y happen to be the
same object or not. Why should their inequality depend on such a
fragile thing?
In fact, identity of NANs is itself an implementation quirk of
programming languages like Python: logically, NANs don't have identity
at all.
To put it another way: all ONEs are the same ONE, even if they come from
different sources, are in different memory locations, or have different
identities; but all NANs are different, even if they come from the same
source, are in the same memory location, or have the same identity.
The fundamental problem here is that NANs are not values. If you treat
them as if they were values, then you want reflexivity of equality. But
they're not -- they're *signals* for "your calculation has gone screwy
and the result you get is garbage", so to speak. You shouldn't even
think of a specific NAN as a piece of specific garbage, but merely a
label on the *kind* of garbage you've got (the payload): INF-INF is, in
some sense, a different kind of error to log(-1).
In the same way you might say "INF-INF could be any number at all,
therefore we return NAN", you might say "since INF-INF could be
anything, there's no reason to think that INF-INF == INF-INF."
--
Steven D'Aprano
More information about the Python-Dev
mailing list