on floating-point numbers

Sat Sep 4 10:42:38 EDT 2021

Richard Damon <Richard at Damon-Family.org> writes:

> On 9/4/21 9:40 AM, Hope Rouselle wrote:
>> Chris Angelico <rosuav at gmail.com> writes:
>> 
>>> On Fri, Sep 3, 2021 at 4:58 AM Hope Rouselle <hrouselle at jevedi.com> wrote:
>>>>
>>>> Hope Rouselle <hrouselle at jevedi.com> writes:
>>>>
>>>>> Just sharing a case of floating-point numbers.  Nothing needed to be
>>>>> solved or to be figured out.  Just bringing up conversation.
>>>>>
>>>>> (*) An introduction to me
>>>>>
>>>>> I don't understand floating-point numbers from the inside out, but I do
>>>>> know how to work with base 2 and scientific notation.  So the idea of
>>>>> expressing a number as
>>>>>
>>>>>   mantissa * base^{power}
>>>>>
>>>>> is not foreign to me. (If that helps you to perhaps instruct me on
>>>>> what's going on here.)
>>>>>
>>>>> (*) A presentation of the behavior
>>>>>
>>>>>>>> import sys
>>>>>>>> sys.version
>>>>> '3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64
>>>>> bit (AMD64)]'
>>>>>
>>>>>>>> ls = [7.23, 8.41, 6.15, 2.31, 7.73, 7.77]
>>>>>>>> sum(ls)
>>>>> 39.599999999999994
>>>>>
>>>>>>>> ls = [8.41, 6.15, 2.31, 7.73, 7.77, 7.23]
>>>>>>>> sum(ls)
>>>>> 39.60000000000001
>>>>>
>>>>> All I did was to take the first number, 7.23, and move it to the last
>>>>> position in the list.  (So we have a violation of the commutativity of
>>>>> addition.)
>>>>
>>>> Suppose these numbers are prices in dollar, never going beyond cents.
>>>> Would it be safe to multiply each one of them by 100 and therefore work
>>>> with cents only?  For instance
>>>
>>> Yes and no. It absolutely *is* safe to always work with cents, but to
>>> do that, you have to be consistent: ALWAYS work with cents, never with
>>> floating point dollars.
>>>
>>> (Or whatever other unit you choose to use. Most currencies have a
>>> smallest-normally-used-unit, with other currency units (where present)
>>> being whole number multiples of that minimal unit. Only in forex do
>>> you need to concern yourself with fractional cents or fractional yen.)
>>>
>>> But multiplying a set of floats by 100 won't necessarily solve your
>>> problem; you may have already fallen victim to the flaw of assuming
>>> that the numbers are represented accurately.
>> 
>> Hang on a second.  I see it's always safe to work with cents, but I'm
>> only confident to say that when one gives me cents to start with.  In
>> other words, if one gives me integers from the start.  (Because then, of
>> course, I don't even have floats to worry about.)  If I'm given 1.17,
>> say, I am not confident that I could turn this number into 117 by
>> multiplying it by 100.  And that was the question.  Can I always
>> multiply such IEEE 754 dollar amounts by 100?
>> 
>> Considering your last paragraph above, I should say: if one gives me an
>> accurate floating-point representation, can I assume a multiplication of
>> it by 100 remains accurately representable in IEEE 754?
>> 
>
> Multiplication by 100 might not be accurate if the number you are
> starting with is close to the limit of precision, because 100 is
> 1.1001 x 64 so multiplying by 100 adds about 5 more 'bits' to the
> representation of the number. In your case, the numbers are well below
> that point.

Alright.  That's clear now.  Thanks so much!

>>>> --8<---------------cut here---------------start------------->8---
>>>>>>> ls = [7.23, 8.41, 6.15, 2.31, 7.73, 7.77]
>>>>>>> sum(map(lambda x: int(x*100), ls)) / 100
>>>> 39.6
>>>>
>>>>>>> ls = [8.41, 6.15, 2.31, 7.73, 7.77, 7.23]
>>>>>>> sum(map(lambda x: int(x*100), ls)) / 100
>>>> 39.6
>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>> Or multiplication by 100 isn't quite ``safe'' to do with floating-point
>>>> numbers either?  (It worked in this case.)
>>>
>>> You're multiplying and then truncating, which risks a round-down
>>> error. Try adding a half onto them first:
>>>
>>> int(x * 100 + 0.5)
>>>
>>> But that's still not a perfect guarantee. Far safer would be to
>>> consider monetary values to be a different type of value, not just a
>>> raw number. For instance, the value $7.23 could be stored internally
>>> as the integer 723, but you also know that it's a value in USD, not a
>>> simple scalar. It makes perfect sense to add USD+USD, it makes perfect
>>> sense to multiply USD*scalar, but it doesn't make sense to multiply
>>> USD*USD.
>> 
>> Because of the units?  That would be USD squared?  (Nice analysis.)
>> 
>>>> I suppose that if I multiply it by a power of two, that would be an
>>>> operation that I can be sure will not bring about any precision loss
>>>> with floating-point numbers.  Do you agree?
>>>
>>> Assuming you're nowhere near 2**53, yes, that would be safe. But so
>>> would multiplying by a power of five. The problem isn't precision loss
>>> from the multiplication - the problem is that your input numbers
>>> aren't what you think they are. That number 7.23, for instance, is
>>> really....
>> 
>> Hm, I think I see what you're saying.  You're saying multiplication and
>> division in IEEE 754 is perfectly safe --- so long as the numbers you
>> start with are accurately representable in IEEE 754 and assuming no
>> overflow or underflow would occur.  (Addition and subtraction are not
>> safe.)
>> 
>
> Addition and Subtraction are just as safe, as long as you stay within
> the precision limits. Multiplication and division by powers of two are
> the safest, not needing to add any precision, until you hit the limits
> of the magnitude of numbers that can be expressed.
>
> The problem is that a number like 0.1 isn't precisely represented, so it
> ends up using ALL available precision to get the closest value to it so
> ANY operations on it run the danger of precision loss.

Got it.  That's clear now.  It should've been before, but my attention
is that of a beginner, so some extra iterations turn up.  As long as the
numbers involved are accurately representable, floating-points have no
other problems.  I may, then, conclude that the whole difficulty with
floating-point is nothing but going beyond the reserved space for the
number.

However, I still lack an easy method to detect when a number is not
accurately representable by the floating-point datatype in use.  For
instance, 0.1 is not representable accurately in IEEE 754.  But I don't
know how to check that

>>> 0.1
0.1 # no clue
>>> 0.1 + 0.1
0.2 # no clue
>>> 0.1 + 0.1 + 0.1
0.30000000000000004 # there is the clue

How can I get a clearer and quicker evidence that 0.1 is not accurately
representable --- using the REPL?

I know 

  0.1 = 1/10 = 1 * 10^-1

and in base 2 that would have to be represented as...  Let me calculate
it with my sophisticated skills:

  0.1: 0 + 0.2
   --> 0 + 0.4
   --> 0 + 0.8
   --> 1 + 0.6
   --> 1 + 0.2, closing a cycle.

So 0.1 is representable poorly as 0.00011...  In other words, 1/10 in
base 10 equals 1/2^4 + 1/2^5 + 1/2^9 + 1/2^10 + ...

The same question in other words --- what's a trivial way for the REPL
to show me such cycles occur?

>>>>>> 7.23.as_integer_ratio()
>>> (2035064081618043, 281474976710656)

Here's what I did on this case.  The REPL is telling me that 

  7.23 = 2035064081618043/281474976710656

If that were true, then 7.23 * 281474976710656 would have to equal
2035064081618043.  So I typed:

>>> 7.23 * 281474976710656
2035064081618043.0

That agrees with the falsehood.  I'm getting no evidence of the problem.

When take control of my life out of the hands of misleading computers, I
calculate the sum:

       844424930131968
 +    5629499534213120
    197032483697459200
    ==================
    203506408161804288 
=/= 203506408161804300

How I can save the energy spent on manual verification?

Thanks very much.