[Python-Dev] Round Bug in Python 1.6?

Tim Peters tim_one@email.msn.com
Sun, 9 Apr 2000 16:14:17 -0400


[Tim]
>> If they're surprised by this, they indeed don't understand the
>> arithmetic at all!  This is an argument for using a different form of
>> arithmetic, not for lying about reality.

> This is not lying!

Yes, I overstated that.  It's not lying, but I defy anyone to explain the
full truth of it in a way even Guido could understand <0.9 wink>.  "Shortest
conversion" is a subtle concept, requiring knowledge not only of the
mathematical value, but of details of the HW representation.  Plain old
"correct rounding" is HW-independent, so is much easier to *fully*
understand.  And in things floating-point, what you don't fully understand
will eventually burn you.

Note that in a machine with 2-bit floating point, the "shortest conversion"
for 0.75 is the string "0.8":  this should suggest the sense in which
"shortest conversion" can be actively misleading too.

> If you type in "3.1416" and Python says "3.1416", then indeed it is the
> case that "3.1416" is a correct way to type in the floating-point number
> being expressed.  So "3.1415999999999999" is not any more truthful than
> "3.1416" -- it's just more annoying.

Yes, shortest conversion is *defensible*.  But Python has no code to
implement that now, so it's not an option today.

> I just tried this in Python 1.5.2+:
>
>     >>> .1
>     0.10000000000000001
>     >>> .2
>     0.20000000000000001
>     >>> .3
>     0.29999999999999999
>     >>> .4
>     0.40000000000000002
>     >>> .5
>     0.5
>     >>> .6
>     0.59999999999999998
>     >>> .7
>     0.69999999999999996
>     >>> .8
>     0.80000000000000004
>     >>> .9
>     0.90000000000000002
>
> Ouch.

As shown in my reply to Christian, shortest conversion is not a cure for
this "gosh, it printed so much more than I expected it to"; it only appears
to "fix it" in the simplest examples.  So long as you want
eval(what's_diplayed) == what's_typed, this is unavoidable.  The only ways
to avoid that are to use a different arithmetic, or stop using repr() at the
prompt.

>> As above.  repr() shouldn't be used at the interactive prompt
>> anyway (but note that I did not say str() should be).

> What, then?  Introduce a third conversion routine and further
> complicate the issue?  I don't see why it's necessary.

Because I almost never want current repr() or str() at the prompt, and even
you <wink> don't want 3.1416-3.141 to display 0.0005999999999999339 (which
is the least you can print and have eval return the true answer).

>>> What should really happen is that floats intelligently print in
>>> the shortest and simplest manner possible

>> This can be done, but only if Python does all fp I/O conversions
>> entirely on its own -- 754-conforming libc routines are inadequate
>> for this purpose

> Not "all fp I/O conversions", right?  Only repr(float) needs to
> be implemented for this particular purpose.  Other conversions
> like "%f" and "%g" can be left to libc, as they are now.

No, all, else you risk %f and %g producing results that are inconsistent
with repr(), which creates yet another set of incomprehensible surprises.
This is not an area that rewards half-assed hacks!  I'm intimately familiar
with just about every half-assed hack that's been tried here over the last
20 years -- they never work in the end.  The only approach that ever bore
fruit was 754's "there is *a* mathematically correct answer, and *that's*
the one you return".  Unfortunately, they dropped the ball here on
float<->string conversions (and very publicly regret that today).

> I suppose for convenience's sake it may be nice to add another
> format spec so that one can ask for this behaviour from the "%"
> operator as well, but that's a separate issue (perhaps "%r" to
> insert the repr() of an argument of any type?).

%r is cool!  I like that.

>>>     def smartrepr(x):
>>>         p = 17
>>>         while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
>>>         return '%%.%df' % p % x

>> This merely exposes accidents in the libc on the specific
>> platform you run it.  That is, after
>>
>>     print smartrepr(x)
>>
>> on IEEE-754 platform A, reading that back in on IEEE-754
?> platform B may not yield the same number platform A started with.

> That is not repr()'s job.  Once again:
>
>     repr() is not for the machine.

And once again, I didn't and don't agree with that, and, to save the next
seven msgs, never will <wink>.

> It is not part of repr()'s contract to ensure the kind of
> platform-independent conversion you're talking about.  It
> prints out the number in a way that upholds the eval(repr(x)) == x
> contract for the system you are currently interacting with, and
> that's good enough.

It's not good enough for Java and Scheme, and *shouldn't* be good enough for
Python.  The 1.6 repr(float) is already platform-independent across IEEE-754
machines (it's not correctly rounded on most platforms, but *does* print
enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all
Python platforms are IEEE-754 (I don't know of an exception -- perhaps
Python is running on some ancient VAX?).  The std has been around for 15+
years, virtually all platforms support it fully now, and it's about time
languages caught up.

BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats
either, even on a single machine.  It is now -- but thanks to the change in
repr.

> If you wanted platform-independent serialization, you would
> use something else.

There is nothing else.  In 1.5.2 and before, people mucked around with
binary dumps hoping they didn't screw up endianness.

> As long as the language reference says
>
>     "These represent machine-level double precision floating
>     point numbers. You are at the mercy of the underlying
>     machine architecture and C implementation for the accepted
>     range and handling of overflow."
>
> and until Python specifies the exact sizes and behaviours of
> its floating-point numbers, you can't expect these kinds of
> cross-platform guarantees anyway.

There's nothing wrong with exceeding expectations <wink>.  Despite what the
reference manual says, virtually all machines use identical fp
representations today (this wasn't  true when the text above was written).

>     str()'s contract:
>       - if x is a string, str(x) == x
>       - otherwise, str(x) is a reasonable string coercion from x

The last is so vague as to say nothing.  My counterpart-- at least equally
vague --is

        - otherwise, str(x) is a string that's easy to read and contains
          a compact summary indicating x's nature and value in general
          terms

>     repr()'s contract:
>       - if repr(x) is syntactically valid, eval(repr(x)) == x
>       - repr(x) displays x in a safe and readable way

I would say instead:

        - every character c in repr(x) has ord(c) in range(32, 128)
        - repr(x) should strive to be easily readable by humans

>       - for objects composed of basic types, repr(x) reflects
>           what the user would have to say to produce x

Given your first point, does this say something other than "for basic types,
repr(x) is syntactically valid"?  Also unclear what "basic types" means.


>     pickle's contract:
>       - pickle.dumps(x) is a platform-independent serialization
>         of the value and state of object x

Since pickle can't handle all objects, this exaggerates the difference
between it and repr.  Give a fuller description, like

        - If pickle.dumps(x) is defined,
          pickle.loads(pickle.dumps(x)) == x

and it's the same as the first line of your repr() contract, modulo

    s/syntactically valid/is defined/
    s/eval/pickle.loads/
    s/repr/pickle.dumps/

The differences among all these guys remain fuzzy to me.

but-not-surprising-when-talking-about-what-people-like-to-look-at-ly
    y'rs  - tim