String to Float, without introducing errors

Sun Dec 18 16:53:56 EST 2022

On 18Dec2022 18:35, Paul St George <email at paulstgeorge.com> wrote:
>So I am working on a physics paper with a colleague. We have a theory about Newtons Cradle. We answer the question why when you lift and drop balls 1 and 2, balls 4 and 5 rise up. I could say more, but ... (if you are interested please write to me).
>
>We want to illustrate the paper with animations. The theory includes distortion of the balls and this distortion is very very small. So, I am sent data with locations and dimensions to 13 decimal places. Something strange is happening with the animations: the balls are not moving smoothly. I do not know (yet) where the problem lies so it is difficult to provide a clear narrative.
>
>Because there is a problem, I am investigating in all areas. This brings me to the question I asked here. I am not expecting six decimal places or three decimal places to be as accurate as thirteen decimal places, but I would like to be in control of or fully aware of what goes on under the bonnet.

First the short take: your machine pobably is quite precise, and float 
is far more performant that the other numeric types available. Your 
source data seem to have more round off than the rounding in a float.

Under the bonnet:

A Python float is effectively a base-2 value in scientific notation.  
Internally it has a base-2 mantissa and base-2 exponent. This page:
https://docs.python.org/3/library/stdtypes.html#typesnumeric
says that CPython's float uses C's "double" floating point type
(you are almost certainly using the CPython implementation) and thus 
you're using the machine's floating point implemenetation.

I believe that almost all modern CPUs implement IEEE 754 floating point:
https://en.wikipedia.org/wiki/IEEE_754

Because they're base 2, various values in other bases will not be 
precisely representable as a float. For example, 1/3 (which you will 
know is _also_ not representable precisely as a base-10 value such as 
0.333).

You can get specifics of your Python's floating point from 
`sys.float_Info`, i.e:

     from sys import float_info

The look at float_info.epsilon etc. Details:
https://docs.python.org/3/library/sys.html#sys.float_info

Here's my machine:

     Python 3.10.6 (main, Aug 11 2022, 13:47:18) [Clang 12.0.0 
     (clang-1200.0.32.29)] on darwin
     Type "help", "copyright", "credits" or "license" for more 
     information.
     >>> from sys import float_info
     >>> float_info
     sys.float_info(max=1.7976931348623157e+308, max_exp=1024, 
     max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, 
     min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, 
     radix=2, rounds=1)

Values of note: mant_dig=53 (53 base-2 bits), dig=15 (15 decimal digits 
of precision).

You might want to look at sys.float_repr_style here:
https://docs.python.org/3/library/sys.html#sys.float_repr_style
which affects how Python writes floats out. In particular this text:

      If the string has value 'short' then for a finite float x, repr(x) 
      aims to produce a short string with the property that 
      float(repr(x)) == x. This is the usual behaviour in Python 3.1 and 
      later.

Again, on my machine:

     >>> 64550.727
     64550.727
     >>> 64550.728
     64550.728
     >>> 64550.72701
     64550.72701
     >>> 64550.7270101
     64550.7270101
     >>> 64550.727010101
     64550.727010101
     >>> 64550.72701010101
     64550.72701010101
     >>> 64550.7270101010101
     64550.72701010101
     >>> 64550.727010101010101
     64550.72701010101
     >>> 64550.72701010101010101
     64550.72701010101

>>> On 17 Dec 2022, at 16:54:05 EST 2022, Thomas Passin wrote:
>On 12/17/2022 3:45 PM, Paul St George wrote:
>> Thanks to all!
>> It was the rounding rounding error that I needed to avoid (as Peter J. Holzer suggested). The use of decimal solved it and just in time. I was about to truncate the number, get each of the characters from the string mantissa, and then do something like this:
>>
>> 64550.727
>>
>> 64550 + (7 * 0.1) + (2 * 0.01) + (7 * 0.001)
>>
>> Now I do not need to!

Good, because if you do that using floats it will be less precise than 
float(64550.727). (Which I see Alan has already stated.)

Your source file contains strings like "64550.727". They look to already 
be less than 13 digits of precision as written i.e. some round off 
already took place when that file was written. Do you know the precision 
of the source data?

I suspect that rather than chasing a "perfect" representation of your 
source data, which is already rounded off, you:
- see if the source values can be obtained more precisely
- figure out which operations in your simulation contribute to the 
   motion roughness you see

I'm no expert on floating point coding for precision, but I believe that 
trying to work with values "close together" in magnitude is important 
because values of different scales inherently convert one of them to the 
other scale (i.e. similar sized exponent part) with corresponding loss 
of precision in the mantissa part. That may require you to form your 
calcutations carefully.

See if you can locate a source for the jerkiness (by printing 
intermediate results) and then maybe rephrase that step?

Cheers,
Cameron Simpson <cs at cskk.id.au>