4 hundred quadrillonth?

Wed May 27 16:26:57 EDT 2009

Luis Zarrabeitia <kyrie at uh.cu> wrote:
> On Thursday 21 May 2009 08:50:48 pm R. David Murray wrote:
> 
>> In py3k Eric Smith and Mark Dickinson have implemented Gay's floating
>> point algorithm for Python so that the shortest repr that will round
>> trip correctly is what is used as the floating point repr....
> 
> Little question: what was the goal of such a change? (is there a pep for me to 
> read?) Shouldn't str() do that, and leave repr as is?

It's a good question.  I was prepared to write a PEP if necessary, but
there was essentially no opposition to this change either in the
python-dev thread that Ned already mentioned, in the bugs.python.org
feature request (see http://bugs.python.org/issue1580; set aside
half-an-hour or so if you want to read this one) or amongst the people
we spoke to at PyCon 2009, so in the end Eric and I just went ahead
and merged the changes.  It didn't harm that Guido supported the idea.

I think the main goal was to see fewer complaints from newbie users
about 0.1 displaying as 0.10000000000000001.  There's no real reason
to produce 17 digits here.  Neither 0.1 nor 0.10000000000000001
displays the true value of the float---both are approximations, so why
not pick the approximation that actually displays nicely.  The only
requirement is that float(repr(x)) recovers x exactly, and since 0.1
produced the float in the first place, it's clear that taking
repr(0.1) to be '0.1' satisfies this requirement.

The problem is particularly acute with the use of the round function,
where newbies complain that round is buggy because it's not rounding
to 2 decimal places:

>>> round(2.45311, 2)
2.4500000000000002

With the new float repr, the result of rounding a float to 2 decimal
places will always display with at most 2 places after the point.
(Well, possibly except when that float is very large.)

Of course, there are still going to be complaints that the following
is rounding in the wrong direction:

>>> round(0.075, 2)
0.07

I'll admit to feeling a bit uncomfortable about the fact that the new
repr goes a little bit further towards hiding floating-point
difficulties from numerically-naive users.

The main things that I like about the new representation is that its
definition is saner (give me the shortest string that rounds
correctly, versus format to 17 places and then somewhat arbitrarily
strip all trailing zeros) and it's more consistent than the old.  With
the current 2.6/3.0 repr (on my machine; your results may vary):

>>> 0.01
0.01
>>> 0.02
0.02
>>> 0.03
0.029999999999999999
>>> 0.04
0.040000000000000001

With Python 3.1:

>>> 0.01
0.01
>>> 0.02
0.02
>>> 0.03
0.03
>>> 0.04
0.04

A cynical response would be to say that the Python 2.6 repr lies only
some of the time; with Python 3.1 it lies *all* of the time.  But
actually all of the above outputs are lies; it's just that the second
set of lies is more consistent and better looking.

There are also a number of significant 'hidden' benefits to using
David Gay's code instead of the system C library's functions, though
those benefits are mostly independent of the choice to use the short
float repr:

- the float repr is much more likely to be consistent across platforms
  (or at least across those platforms using IEEE 754 doubles, which
   seems to be 99.9% percent of them)

- the C library double<->string conversion functions are buggy on many
  platforms (including at least OS X, Windows and some flavours of
  Linux).  While I won't claim that Gay's code (or our adaptation of
  it) is bug-free, I don't know of any bugs (reports welcome!) and at
  least when bugs are discovered it's within our power to fix them.
  Here's one example of an x == eval(repr(x)) failure due to a bug in
  the OS X implementation of strtod:

  >>> x = (2**52-1)*2.**(-1074)
  >>> x
  2.2250738585072009e-308
  >>> y = eval(repr(x))
  >>> y
  2.2250738585072014e-308
  >>> x == y
  False

- similar to the last point: on many platforms string formatting is
  not correctly rounded, in the sense that e.g. '%.6f' % x does not
  necessarily produce the closest decimal with 6 places after the
  decimal point to x.  This is *not* a platform bug, since there's no
  requirement of correct rounding in the C standards.  However, David
  Gay's code does provide correctly rounded string -> double and
  double -> string conversions, so Python's string formatting will now
  always be correctly rounded.  A small thing, but it's nice to have.

- since both round() and string formatting now both use Gay's code, we
  can finally guarantee that round and string formatting give
  equivalent results: e.g., that the digits in round(x, 2) are the
  same as the digits in '%.2f' % x.  That wasn't true before: round
  could round up while '%.2f' % x rounded down (or vice versa) leading
  to confusion and at least one semi-bogus bug report.

- a lot of internal cleanup has become possible as a result of not
  having to worry about all the crazy things that platform string <->
  double conversions can do.  This makes the CPython code smaller,
  clearer, easier to maintain, and less likely to contain bugs.

> While I agree that the change gets rid of the weekly newbie question 
> about "python's lack precision", I'd find more difficult to explain why
> 0.2 * 3 != 0.6 without showing them what 0.2 /really/ means.

There are still plenty of ways to show what 0.2 really means.  My
favourite is to use the Decimal.from_float method:

>>> Decimal.from_float(0.2)
Decimal('0.200000000000000011102230246251565404236316680908203125')

This is only available in 2.7 and 3.1, but then the repr change isn't
happening until 3.1 (and it almost certainly won't be backported to
2.7, by the way), so that's okay.  But there's also float.hex,
float.as_integer_ratio, and Fraction.from_float to show the exact
value that's stored for a float.

>>> 0.2.hex()
'0x1.999999999999ap-3'
>>> Fraction.from_float(0.2)
Fraction(3602879701896397, 18014398509481984)

Hmm.  That was a slightly unfortunate choice of example: the hex form
of 0.2 looks uncomfortably similar to 1.9999999....  An interesting
cross-base accident.

This is getting rather long.  Perhaps I should put the above comments
together into a 'post-PEP' document.

Mark