Precision issue

Mon Oct 13 10:29:26 EDT 2003

[Duncan Booth]
>>> There's no reason why Python couldn't do the same:
>>>
>>> def float_repr(x):
>>>      s = "%.15g" % x
>>>      if float(s)==x: return s
>>>      return "%.17g" % x

[Tim]
>> Sorry, but there is a reason:  if done on a platform whose C library
>> implements perfect-rounding double->string (e.g., I think gcc does
>> now), this can hit cases where the string may not reproduce x when
>> eval'ed back on a different platform whose C library isn't so
>> conscientious but which nevertheless meets the 754 standard's more
>> forgiving (than perfect rounding) requirements.
>>
>> This is acutely important because Python's marshal format (used for
>> .pyc files) represents floats as repr'ed strings.  By making repr()
>> pump out 17 digits, we maximize the odds that .pyc files ported
>> across platforms load back exactly the same 754 doubles across (754)
>> platforms.

[Duncan]
> Thanks for giving me the reason, but I find this argument
> unconvincing on several counts.
>
> If a system has an inaccurate floating point library, then introducing
> further inconsistencies based on whether the .pyc file was compiled
> locally or copied from another system doesn't sound like a good
> solution. Surely if the library is inaccurate you are going to get
> inaccurate results no matter what tweaks Python tries to apply?

You snipped most of my msg.  As explained in the parts not reproduced here,
Python is aiming to work correctly across (at least) platforms where the
native C library meets the minimal requirements of the 754 standard for
float <-> string accuracy.  That doesn't require perfect rounding in all
cases, but to call a system meeting no more than the minimal requirements
"inaccurate" is quite a stretch.  It can require multi-thousand bit
arithmetic (in some cases) to do perfect rounding, and that's why the
standard allowed for a small bit of slop.  Perfect rounding isn't necessary
for eval(str(float)) == float to hold always; it's enough that platforms
meet the minimal 754 requirements and at least 17 significant digits are
produced in the float->string direction.

> Also the marshal code doesn't actually use repr.  For that matter the
> interactive prompt which is what causes the problems I want to avoid in
> the first place doesn't use repr either!  (Marshal uses
> PyFloat_AsReprString which comments say should be deprecated, repr
> uses float_repr, and interactive mode uses float_print.)

PyFloat_AsReprString(afloat) is the C API spelling of the Python-level
repr(afloat), as documented in floatobject.h.  The comments say it should be
deprecated because it "pass[es] a char buffer without passing a length",
which has nothing to do with the result it produces; adding a buffer length
argument would satisfy the complaint.

It's a general rule that repr(obj) is produced at the interactive prompt
regardless of the type of obj; the specific function called to produce that
result in the specific case of isintance(obj, float) isn't really
interesting; what's relevant is that it *does* produce repr(float), however
it's implemented.  It's also a general rule that eval(repr(obj)) == obj
should hold when sanely possible, again without regard to type(obj).  That
last rule is why repr(float) does what it does; marshal exploits it.

There are other complaints that can be made about the interactive prompt
using repr(), and many such have been made over the years.  sys.displayhook
was introduced in the hopes that people would build prompt format functions
they like better, and share them.  It's remarkable (to me -- that's why I'm
remarking <wink>) that so few have.

> If you think it is important, I don't have any problems with leaving
> the marshalling code generating as many digits as it wants.

It's vital for marshal to try to reproduce floats across platforms.  It does
OK at that now, but I think it would be better for marshal to move to a
binary format.  That's got problems of its own, due to compatibility
hassles.

Regardless of what marshal does, it's still a general rule that Python
strive to maintain that eval(repr(x)) == x.  This is true now for all
builtin scalar types, and for lists, tuples and dicts composed (possibly
recursively) of those.

repr(obj) can be an undesirable thing to produce at an interactive prompt
for many reasons, some depending on taste.  That's why sys.displayhook
exists, so you can change interactive prompt behavior.  The reason I like,
e.g., 0.1 *not* to display as "0.1" by default was given toward the end of
my msg (and had nothing to do with marshal, btw).