[Python-Dev] Round Bug in Python 1.6?
Tim Peters
tim_one@email.msn.com
Sun, 9 Apr 2000 16:14:17 -0400
[Tim]
>> If they're surprised by this, they indeed don't understand the
>> arithmetic at all! This is an argument for using a different form of
>> arithmetic, not for lying about reality.
> This is not lying!
Yes, I overstated that. It's not lying, but I defy anyone to explain the
full truth of it in a way even Guido could understand <0.9 wink>. "Shortest
conversion" is a subtle concept, requiring knowledge not only of the
mathematical value, but of details of the HW representation. Plain old
"correct rounding" is HW-independent, so is much easier to *fully*
understand. And in things floating-point, what you don't fully understand
will eventually burn you.
Note that in a machine with 2-bit floating point, the "shortest conversion"
for 0.75 is the string "0.8": this should suggest the sense in which
"shortest conversion" can be actively misleading too.
> If you type in "3.1416" and Python says "3.1416", then indeed it is the
> case that "3.1416" is a correct way to type in the floating-point number
> being expressed. So "3.1415999999999999" is not any more truthful than
> "3.1416" -- it's just more annoying.
Yes, shortest conversion is *defensible*. But Python has no code to
implement that now, so it's not an option today.
> I just tried this in Python 1.5.2+:
>
> >>> .1
> 0.10000000000000001
> >>> .2
> 0.20000000000000001
> >>> .3
> 0.29999999999999999
> >>> .4
> 0.40000000000000002
> >>> .5
> 0.5
> >>> .6
> 0.59999999999999998
> >>> .7
> 0.69999999999999996
> >>> .8
> 0.80000000000000004
> >>> .9
> 0.90000000000000002
>
> Ouch.
As shown in my reply to Christian, shortest conversion is not a cure for
this "gosh, it printed so much more than I expected it to"; it only appears
to "fix it" in the simplest examples. So long as you want
eval(what's_diplayed) == what's_typed, this is unavoidable. The only ways
to avoid that are to use a different arithmetic, or stop using repr() at the
prompt.
>> As above. repr() shouldn't be used at the interactive prompt
>> anyway (but note that I did not say str() should be).
> What, then? Introduce a third conversion routine and further
> complicate the issue? I don't see why it's necessary.
Because I almost never want current repr() or str() at the prompt, and even
you <wink> don't want 3.1416-3.141 to display 0.0005999999999999339 (which
is the least you can print and have eval return the true answer).
>>> What should really happen is that floats intelligently print in
>>> the shortest and simplest manner possible
>> This can be done, but only if Python does all fp I/O conversions
>> entirely on its own -- 754-conforming libc routines are inadequate
>> for this purpose
> Not "all fp I/O conversions", right? Only repr(float) needs to
> be implemented for this particular purpose. Other conversions
> like "%f" and "%g" can be left to libc, as they are now.
No, all, else you risk %f and %g producing results that are inconsistent
with repr(), which creates yet another set of incomprehensible surprises.
This is not an area that rewards half-assed hacks! I'm intimately familiar
with just about every half-assed hack that's been tried here over the last
20 years -- they never work in the end. The only approach that ever bore
fruit was 754's "there is *a* mathematically correct answer, and *that's*
the one you return". Unfortunately, they dropped the ball here on
float<->string conversions (and very publicly regret that today).
> I suppose for convenience's sake it may be nice to add another
> format spec so that one can ask for this behaviour from the "%"
> operator as well, but that's a separate issue (perhaps "%r" to
> insert the repr() of an argument of any type?).
%r is cool! I like that.
>>> def smartrepr(x):
>>> p = 17
>>> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
>>> return '%%.%df' % p % x
>> This merely exposes accidents in the libc on the specific
>> platform you run it. That is, after
>>
>> print smartrepr(x)
>>
>> on IEEE-754 platform A, reading that back in on IEEE-754
?> platform B may not yield the same number platform A started with.
> That is not repr()'s job. Once again:
>
> repr() is not for the machine.
And once again, I didn't and don't agree with that, and, to save the next
seven msgs, never will <wink>.
> It is not part of repr()'s contract to ensure the kind of
> platform-independent conversion you're talking about. It
> prints out the number in a way that upholds the eval(repr(x)) == x
> contract for the system you are currently interacting with, and
> that's good enough.
It's not good enough for Java and Scheme, and *shouldn't* be good enough for
Python. The 1.6 repr(float) is already platform-independent across IEEE-754
machines (it's not correctly rounded on most platforms, but *does* print
enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all
Python platforms are IEEE-754 (I don't know of an exception -- perhaps
Python is running on some ancient VAX?). The std has been around for 15+
years, virtually all platforms support it fully now, and it's about time
languages caught up.
BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats
either, even on a single machine. It is now -- but thanks to the change in
repr.
> If you wanted platform-independent serialization, you would
> use something else.
There is nothing else. In 1.5.2 and before, people mucked around with
binary dumps hoping they didn't screw up endianness.
> As long as the language reference says
>
> "These represent machine-level double precision floating
> point numbers. You are at the mercy of the underlying
> machine architecture and C implementation for the accepted
> range and handling of overflow."
>
> and until Python specifies the exact sizes and behaviours of
> its floating-point numbers, you can't expect these kinds of
> cross-platform guarantees anyway.
There's nothing wrong with exceeding expectations <wink>. Despite what the
reference manual says, virtually all machines use identical fp
representations today (this wasn't true when the text above was written).
> str()'s contract:
> - if x is a string, str(x) == x
> - otherwise, str(x) is a reasonable string coercion from x
The last is so vague as to say nothing. My counterpart-- at least equally
vague --is
- otherwise, str(x) is a string that's easy to read and contains
a compact summary indicating x's nature and value in general
terms
> repr()'s contract:
> - if repr(x) is syntactically valid, eval(repr(x)) == x
> - repr(x) displays x in a safe and readable way
I would say instead:
- every character c in repr(x) has ord(c) in range(32, 128)
- repr(x) should strive to be easily readable by humans
> - for objects composed of basic types, repr(x) reflects
> what the user would have to say to produce x
Given your first point, does this say something other than "for basic types,
repr(x) is syntactically valid"? Also unclear what "basic types" means.
> pickle's contract:
> - pickle.dumps(x) is a platform-independent serialization
> of the value and state of object x
Since pickle can't handle all objects, this exaggerates the difference
between it and repr. Give a fuller description, like
- If pickle.dumps(x) is defined,
pickle.loads(pickle.dumps(x)) == x
and it's the same as the first line of your repr() contract, modulo
s/syntactically valid/is defined/
s/eval/pickle.loads/
s/repr/pickle.dumps/
The differences among all these guys remain fuzzy to me.
but-not-surprising-when-talking-about-what-people-like-to-look-at-ly
y'rs - tim