a.index(float('nan')) fails

Fri Oct 26 14:40:55 EDT 2012

On Sat, 27 Oct 2012 03:45:46 +1100, Chris Angelico wrote:

> On Sat, Oct 27, 2012 at 3:23 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> In real life, you are *much* more likely to run into these examples of
>> "insanity" of floats than to be troubled by NANs:
>>
>> - associativity of addition is lost
>> - distributivity of multiplication is lost 
>> - commutativity of addition is lost
>> - not all floats have an inverse
>>
>> e.g.
>>
>> (0.1 + 0.2) + 0.3 != 0.1 + (0.2 + 0.3)
>>
>> 1e6*(1.1 + 2.2) != 1e6*1.1 + 1e6*2.2
>>
>> 1e10 + 0.1 + -1e10 != 1e10 + -1e10 + 0.1
>>
>> 1/(1/49.0) != 49.0
>>
>> Such violations of the rules of real arithmetic aren't even hard to
>> find. They're everywhere.
> 
> Actually, as I see it, there's only one principle to take note of: the
> "HMS Pinafore Floating Point Rule"...
> 
> ** Floating point expressions should never be tested for equality ** 
> ** What, never? **
> ** Well, hardly ever! **
> 
> The problem isn't with the associativity, it's with the equality
> comparison. Replace "x == y" with "abs(x-y)<epsilon" for some epsilon
> and all your statements fulfill people's expectations.

O RYLY?

Would you care to tell us which epsilon they should use?

Hint: *whatever* epsilon you pick, there will be cases where that is 
either stupidly too small, stupidly too large, or one that degenerates to 
float equality. And you may not be able to tell if you have one of those 
cases or not.

Here's a concrete example for you: 

What *single* value of epsilon should you pick such that the following 
two expressions evaluate correctly?

sum([1e20, 0.1, -1e20, 0.1]*1000) == 200
sum([1e20, 99.9, -1e20, 0.1]*1000) != 200

The advice "never test floats for equality" is:

(1) pointless without a good way to know what epsilon they should use;

(2) sheer superstition since there are cases where testing floats for 
equality is the right thing to do (although I note you dodged that bullet 
with "hardly ever" *wink*);

and most importantly

(3) missing the point, since the violations of the rules of real-valued 
mathematics still occur regardless of whether you explicitly test for 
equality or not.

For instance, if you write:

result = a + (b + c)

some compilers may assume associativity and calculate (a + b) + c 
instead. But that is not guaranteed to give the same result! (K&R allowed 
C compilers to do that; the subsequent ANSI C standard prohibited re-
ordering, but in practice most C compilers provide a switch to allow it.)

A real-world example: Python's math.fsum is a high-precision summation 
with error compensation based on the Kahan summation algorithm. Here's a 
pseudo-code version:

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

which includes the steps:

t = sum + y;
c = (t - sum) - y;

A little bit of algebra should tell you that c must equal zero. 
Unfortunately, in this case algebra is wrong, because floats are not real 
numbers. c is not necessarily zero.

An optimizing compiler, or an optimizing programmer, might very well 
eliminate those calculations and so inadvertently eliminate the error 
compensation. And not an equals sign in sight.

-- 
Steven