What exactly is "exact" (was Clean Singleton Docstrings)

Steven D'Aprano steve at pearwood.info
Mon Jul 18 23:42:43 EDT 2016


On Mon, 18 Jul 2016 08:15 pm, Chris Angelico wrote:

> On Mon, Jul 18, 2016 at 8:00 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>> Python programmers (among others) frequently run into issues with
>> surprising results in floating-point arithmetics. For better or worse,
>> Scheme has tried to abstract the concept. You don't need to explain the
>> ideas of IEEE 64-bit floating-point numbers or tie the hands of the
>> implementation. Instead, what you have is "reliable" arithmetics and
>> "best-effort" arithmetics, a bit like TCP is "reliable" and UDP is
>> "best-effort".
> 
> The problem with that is that failing to explain IEEE floating point
> and just calling it "inexact" scares people off unnecessarily. I've
> seen a lot of very intelligent people who think that you should never
> compare floats with the == operator, because floats randomly introduce
> "inaccuracy". 

Yes, this. "Never compare floats for equality" is a pernicious myth that
won't die.


> And then you get these sorts of functions: 
> 
> EPSILON = 0.000001 # Adjust to control numeric accuracy
> def is_equal(f1, f2, epsilon=EPSILON):
>     if abs(f1) > abs(f2):
>         f1, f2 = f2, f1
>     return abs(f2-f1) < f1*epsilon
> 
> and interminable debates about how to pick an epsilon, whether it
> should be relative to the smaller value (as here) or the larger (use
> f2 instead), or maybe should be an absolute value, or maybe it should
> be relative to the largest/smallest value that was ever involved in
> the calculation, or........

Your code is buggy. Consider:

py> is_equal(-1.0, -1.0)
False



> Floating point numbers are a representation of real numbers that
> involves a certain amount of precision. They're ultimately no
> different from grade-school arithmetic where you round stuff off so
> you don't need an infinite amount of paper, except that they work with
> binary rather than decimal, so people think "0.1 + 0.2 ought to be
> exactly 0.3, why isn't it??", and blame floats.

Well, kinda... yes, ultimately deep down you're right. There's nothing
mysterious about floats. The lack of fundamental properties associativity:

(a+b)+c = a+(b+c)

and distributivity:

a×(b+c) = a×b + a×c

are due to numbers being recorded in finite precision, which means that some
calculations are inexact. But the *consequences* of that simple fact are
quite profound, and difficult. Earlier you mentioned "interminable debates
about how to pick an epsilon", but the reason for that is that it is
really, really hard to pick an epsilon in any systematic, objective way.

In the statistics module, I have run into this problem. Where possible, and
surprisingly often, I can test for exact equality. For example, here are a
couple of tests for geometric mean:


    def test_multiply_data_points(self):
        # Test multiplying every data point by a constant.
        c = 111
        data = [3.4, 4.5, 4.9, 6.7, 6.8, 7.2, 8.0, 8.1, 9.4]
        expected = self.func(data)*c
        result = self.func([x*c for x in data])
        self.assertEqual(result, expected)

    def test_doubled_data(self):
        # Test doubling data from [a,b...z] to [a,a,b,b...z,z].
        data = [random.uniform(1, 500) for _ in range(1000)]
        expected = self.func(data)
        actual = self.func(data*2)
        self.assertApproxEqual(actual, expected, rel=1e-13)


I didn't hand-tune the constants in test_multiply_data_points, but nor can I
guarantee that if you replace them with other constants of similar
magnitude the assertEqual test will still be appropriate.

In the test_doubled_data case, rounding errors accumulate faster, and cancel
less often, so I use an inexact comparison. Why do I check for a relative
error of 1e-13, rather than 1e-12 or 2.5e-14? *shrug* I can't give an
objective reason for it. It just seems right to me: if the relative error
was much bigger, I'd say that the geometric mean function was too
inaccurate. If it were much smaller, it's too hard for me to have the tests
pass.




-- 
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list