[Python-Dev] Expert floats

Tim Peters tim.one at comcast.net
Tue Mar 30 20:43:56 EST 2004


[Ping]
> All right.  Maybe we can make some progress.

Probably not -- we have indeed been thru all of this before.

> I agree that round-to-12 was a real problem.  But i think we are
> talking about two different use cases: compiling to disk and
> displaying on screen.

Those are two of the use cases, yes.

> I think we can satisfy both desires.
>
> If i understand you right, your primary aim is to make sure the
> marshalled form of any floating-point number yields the closest
> possible binary approximation to the machine value on the original
> platform,

I want marshaling of fp numbers to give exact (not approximate) round-trip
equality on a single box, and across all boxes supporting the 754 standard
where C maps "double" to a 754 double.  I also want marshaling to preserve
as much accuracy as possible across boxes with different fp formats,
although that may not be practical.  Strings have nothing to do with that,
except for the historical accident that marshal happens to use decimal
strings.  Changing repr(float) to produce 17 digits went a very long way
toward achieving all that at the time, with minimal code changes.  The
consequences of that change I *really* like didn't become apparent for
years.

> even when that representation is used on a different platform.  (Is
> that correct? Perhaps it's best if you clarify -- exactly what is the
> invariant you want to maintain, and what changes [in platform or
> otherwise] do you want the representation to withstand?)

As above.  Beyond exact equality across suitable 754 boxes, we'd have to
agree on a parameterized model of fp, and explain "as much accuracy as
possible" in terms of that.  But you don't care, so I won't bother <wink>.

> That doesn't have to be part of repr()'s contract.  (In fact, i would
> argue that already repr() makes no such promise.)

It doesn't, but the docs do say:

    If at all possible, this [repr's result] should look like a valid
    Python expression that could be used to recreate an object with the
    same value (given an appropriate environment).

This is possible for repr(float), and is currently true for repr(float) (on
754-conforming boxes).

> repr() is about providing a representation for humans.

I think the docs are quite clear that this function belongs to str():

    .... the ``informal'' string representation of an object.  This
    differs from __repr__() in that it does not have to be a valid
    Python expression: a more convenient or concise representation
    may be used instead. The return value must be a string object.

> Can we agree on maximal precision for marshalling,

I don't want to use strings at all for marshalling.  So long as we do, 17 is
already correct for that purpose (< 17 doesn't preserve equality, > 17 can't
be relied on across 754-conforming C libraries).

> and shortest-accurate precision for repr, so we can both be happy?

As I said before (again and again and again <wink>), I'm the one who has
fielded most newbie questions about fp since Python's beginning, and I'm
very happy with the results of changing repr() to produce 17 digits.  They
get a little shock at the start now, but potentially save themselves from
catastrophe by being forced to grow some *necessary* caution about fp
results early.

So, no, we're not going to agree on this.  My answer for newbies who don't
know and don't care (and who are determined never to know or care) has
always been to move to a Decimal module.  That's less surprising than binary
fp in several ways, and 2.4 will have it.

> (By shortest-accurate i mean "the shortest representation that
> converts to the same machine number".  I believe this is exactly what
> Andrew described as Scheme's method.

Yes, except that Scheme also requires that this string be correctly rounded
to however many digits are produced.  A string s just satisfying

    eval(s) == some_float

needn't necessarily be correctly rounded to s's precision.

> If you are very concerned about this being a complex and/or slow
> operation, a fine compromise would be a "try-12" algorithm: if %.12g
> is accurate then use it, and otherwise use %.17g. This is simple,
> easy to implement, produces reasonable results in most cases, and
> has a small bound on CPU cost.)

Also unique to Python.

>     def try_12(x):
>         rep = '%.12g' % x
>         if float(rep) == x: return rep
>         return '%.17g' % x
>
>     def shortest_accurate(x):
>         for places in range(17):
>             fmt = '%.' + str(places) + 'g'
>             rep = fmt % x
>             if float(rep) == x: return rep
>         return '%.17g' % x

Our disagreement is more fundamental than that.  Then again, it always has
been <smile!>.




More information about the Python-Dev mailing list