float("nan") in set or as key

Thu Jun 2 05:54:30 EDT 2011

On Wed, 01 Jun 2011 21:41:06 +0100, Nobody wrote:

> On Sun, 29 May 2011 23:31:19 +0000, Steven D'Aprano wrote:
> 
>>> That's overstating it. There's a good argument to be made for raising
>>> an exception.
>> 
>> If so, I've never heard it, and I cannot imagine what such a good
>> argument would be. Please give it.
> 
> Exceptions allow you to write more natural code by ignoring the awkward
> cases. E.g. writing "x * y + z" rather than first determining whether "x
> * y" is even defined then using a conditional.

You've quoted me out of context. I wasn't asking for justification for 
exceptions in general. There's no doubt that they're useful. We were 
specifically talking about NAN == NAN raising an exception rather than 
returning False.

>>> Bear in mind that an exception is not necessarily an error, just an
>>> "exceptional" condition.
>> 
>> True, but what's your point? Testing two floats for equality is not an
>> exceptional condition.
> 
> NaN itself is an exceptional condition which arises when a result is
> undefined or not representable. When an operation normally returns a
> number but a specific case cannot do so, it returns not-a-number.

I'm not sure what "not representable" is supposed to mean, but if you 
"undefined" you mean "invalid", then correct.

> The usual semantics for NaNs are practically identical to those for
> exceptions. If any intermediate result in a floating-point expression is
> NaN, the overall result is NaN. 

Not necessarily. William Kahan gives an example where passing a NAN to 
hypot can justifiably return INF instead of NAN. While it's certainly 
true that *mostly* any intermediate NAN results in a NAN, that's not a 
guarantee or requirement of the standard. A function is allowed to 
convert NANs back to non-NANs, if it is appropriate for that function.

Another example is the Kronecker delta:

def kronecker(x, y):
    if x == y: return 1
    return 0

This will correctly consume NAN arguments. If either x or y is a NAN, it 
will return 0.

(As an aside, this demonstrates that having NAN != any NAN, including 
itself, is useful, as kronecker(x, x) will return 0 if x is a NAN.)

> Similarly, if any intermediate
> calculation throws an exception, the calculation as a whole throws an
> exception.

This is certainly true... the exception cannot look into the future and 
see that it isn't needed because a later calculation cancels it out.

Exceptions, or hardware traps, stop the calculation. NANs allow the 
calculation to proceed. Both behaviours are useful, and the standard 
allows for both.

> If x is NaN, then "x + y" is NaN, "x * y" is NaN, pretty much anything
> involving x is NaN. By this reasoning both "x == y" and "x != y" should
> also be NaN. 

NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN 
because it is an invalid operation, not because NANs are magical goop 
that spoil everything they touch.

For example, print(NAN) does not return a NAN or raise an exception, nor 
is there any need for it to. Slightly more esoteric: the signbit and 
copysign functions both accept NANs without necessarily returning NANs.

Equality comparison is another such function. There's no need for 
NAN == NAN to fail, because the equality operation is perfectly well 
defined for NANs.

> But only the floating-point types have a NaN value, while
> bool doesn't. However, all types have exceptions.

What relevance does bool have? 

>>>> The correct answer to "nan == nan" is False, they are not equal.
>>> 
>>> There is no correct answer to "nan == nan".
>> 
>> Why on earth not?
> 
> Why should there be a correct answer? What does NaN actually mean?

NAN means "this is a sentinel marking that an invalid calculation was 
attempted". For the purposes of numeric calculation, it is often useful 
to allow those sentinels to propagate through your calculation rather 
than to halt the program, perhaps because you hope to find that the 
invalid marker ends up not being needed and can be ignored, or because 
you can't afford to halt the program.

Does INVALID == INVALID? There's no reason to think that the question 
itself is an invalid operation. If you can cope with the question "Is an 
apple equal to a puppy dog?" without shouting "CANNOT COMPUTE!!!" and 
running down the street, there's no reason to treat NAN == NAN as 
anything worse.

So what should NAN == NAN equal? Consider the answer to the apple and 
puppy dog comparison. Chances are that anyone asked that will give you a 
strange look and say "Of course not, you idiot". (In my experience, and 
believe it or not I have actually tried this, some people will ask you to 
define equality. But they're a distinct minority.)

If you consider "equal to" to mean "the same as", then the answer is 
clear and obvious: apples do not equal puppies, and any INVALID sentinel 
is not equal to any other INVALID. (Remember, NAN is not a value itself, 
it's a sentinel representing the fact that you don't have a valid number.)

So NAN == NAN should return False, just like the standard states, and 
NAN != NAN should return True. "No, of course not, they're not equal."

> Apart from anything else, defining "NaN == NaN" as False means that "x
> == x" is False if x is NaN, which violates one of the fundamental axioms
> of an equivalence relation (and, in every other regard, "==" is normally
> intended to be an equivalence relation).

Yes, that's a consequence of NAN behaviour. I can live with that.

> The creation of NaN was a pragmatic decision on how to handle
> exceptional conditions in hardware. It is not holy writ, and there's no
> fundamental reason why a high-level language should export the
> hardware's behaviour verbatim.

There is a good, solid reason: it's a *useful* standard that *works*, 
proven in practice, invented by people who have forgotten more about 
floating point than you or I will ever learn, and we dismiss their 
conclusions at our peril.

A less good reason: its a standard. Better to stick to a not-very-good 
standard than to have the Wild West, where everyone chooses their own 
behaviour. You have NAN == NAN raise ValueError, Fred has it return True, 
George has it return False, Susan has it return a NAN, Michelle makes it 
raise MathError, somebody else returns Maybe ... 

But IEEE-754 is not just a "not-very-good" standard. It is an extremely 
good standard.

>>> Arguably, "nan != nan" should also be false, but that would violate
>>> the invariant "(x != y) == !(x == y)".
>> 
>> I cannot imagine what that argument would be. Please explain.
> 
> A result of NaN means that the result of the calculation is undefined,
> so the value is "unknown". 

Incorrect. NANs are not "unknowns", or missing values.

-- 
Steven