float("nan") in set or as key

Fri Jun 3 20:14:03 EDT 2011

Steven D'Aprano wrote:
> Fair point. Call it an extension of the Kronecker Delta to the reals then.

That's called the Dirac delta function, and it's a bit different --
instead of a value of 1, it has an infinitely high spike of zero
width at the origin, whose integral is 1. (Which means it's not
strictly a function, because it's impossible for a true function
on the reals to have those properties.)

You don't normally use it on its own; usually it turns up as part
of an integral. I find it difficult to imagine a numerical algorithm
that relies on directly evaluating it. Such an algorithm would be
numerically unreliable. You just wouldn't do it that way; you'd
find some other way to calculate the integral that avoids evaluating
the delta.

> y = 2.1e12
> if abs(x - y) <= 1e-9:
>     # x is equal to y, within exact tolerance
>     ...

If you expect your numbers to be on the order of 1e12, then 1e-9
is obviously not a sensible choice of tolerance. You don't just
pull tolerances out of thin air, you justify them based on
knowledge of the problem at hand.

> In practice, either the function needs some sort of "how to decide 
> equality" parameter,

If it's general purpose library code, then yes, that's exactly
what it needs.

> or you use exact floating point equality and leave it 
> up to the caller to make sure the arguments are correctly rounded

Not really a good idea. Trying to deal with this kind of thing
by rounding is fraught with difficulties and pitfalls. It can
only work when you're not really using floats as approximations
of reals, but as some set of discrete values, in which case
it's probably safer to use appropriately-scaled integers.

> - from William Kahan, and the C99 standard: hypot(INF, x) is always INF 
> regardless of the value of x, hence hypot(INF, NAN) returns INF.
> 
> - since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0) 
> is also 1.

These are different from your kronecker(), because the result
*never* depends on the value of x, whether it's NaN or not.
But kronecker() clearly does depend on the value of x sometimes.

The reasoning appears to be based on the idea that NaN means
"some value, we just don't know what it is". Accepting that
interpretation, the argument doesn't apply to kronecker().
You can't say that the NaN in kronecker(NaN, 42) doesn't
matter, because if you don't know what value it represents,
you can't be sure that it *isn't* meant to be 42.

> Another standard example where NANs get thrown away is the max and min 
> functions. The latest revision of IEEE-754 (2008) allows for max and min 
> to ignore NANs.

Do they provide a justification for that? I'm having trouble
seeing how it makes sense.

-- 
Greg