[Python-Dev] Two proposed changes to float formatting

Sun Apr 26 12:06:56 CEST 2009

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:

>>> '%f' % 2**166.
'93536104789177786765035829293842113257979682750464.000000'
>>> '%f' % 2**167.
'1.87072e+50'

I propose removing this feature for 3.1

More details: The current behaviour is documented (standard
library->builtin types).  (Until very recently, it was actually
misdocumented as changing at 1e25, not 1e50.)

"""For safety reasons, floating point precisions are clipped to 50; %f
conversions for numbers whose absolute value is over 1e50 are
replaced by %g conversions. [5] All other errors raise exceptions."""

There's even a footnote:

"""[5]	These numbers are fairly arbitrary. They are intended to
avoid printing endless strings of meaningless digits without
hampering correct use and without having to know the exact
precision of floating point values on a particular machine."""

I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.  I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.

Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.
 - now that we're using David Gay's 'perfect rounding'
   code, we can be sure that the digits aren't entirely
   meaningless, or at least that they're the 'right' meaningless
   digits.  This wasn't true before.
 - C doesn't do this, and the %f, %g, %e formats really
   owe their heritage to C.
 - float formatting is already quite complicated enough; no
   need to add to the mental complexity
 - removal simplifies the implementation :-)

On to the second proposed change:

(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:

>>> 4., 10.
(4.0, 10.0)
>>> 4. + 10.j
(4+10j)

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".

Mostly this is just about consistency, ease of implementation,
and aesthetics.  As far as I can tell, the extra '.0' in the float
repr serves two closely-related purposes:  it makes it clear to
the human reader that the number is a float rather than an
integer, and it makes sure that e.g., eval(repr(x)) recovers a
float rather than an int.  The latter point isn't a concern for
the current complex repr, but the former is:  4+10j looks to
me more like a Gaussian integer than a complex number.

Any comments?

Mark