[Python-Dev] Expert floats

Tue Mar 30 18:24:03 EST 2004

[Tim]
>> It is, but it wasn't practical.  754 requires that float->string
>> done to 17 significant digits, then back to float again, will
>> reproduce the original float exactly.

[Andrew]
> But that rules out those hypothetical machines with greater precision
> on which you are basing your argument.

Sorry, couldn't follow that one.

>> It doesn't require perfect rounding (there are different
>> accuracy requirements over different parts of the domain -- it's
>> complicated), and it doesn't require that a conforming float->string
>> operation be able to produce more than 17 meaningful digits.

> I thought that 754 requires input and output to be no more than 0.47
> LSB away from exact.

No, and no standard can ask for better than 0.5 ULP error (when the true
result is halfway between two representable quantities, 0.5 ULP is the
smallest possible error "even in theory").  It requires perfect rounding for
"suitably small" inputs; outside that range, and excepting
underflow/overflow:

- for nearest/even rounding, it requires no more than 0.47 ULP error
  *beyond* that allowed for perfect nearest/even conversion (which
  has a max error of 0.5 ULP on its own)

- for directed rounding, no more than 1.47 ULP error (and with the
  correct sign)

> Surely the spirit of 754 would require more than 17 significant
> digits on machines with more than 56-bit fractions.

Yes, and the derivation of "17" for IEEE double format isn't hard.  Of
course the 754 standard doesn't say anything about non-754 architectures;
there are generalizations in the related 854 standard.

...

> Understood.  What I meant when I started this thread was that I think
> things would be better in some ways if Python did not rely on the
> underlying C library for its floating-point conversions--especially
> in light of the fact that not all C libraries meet the 754
> requirements for conversions.

No argument there.  In fact, it would be better in some ways if Python
didn't rely on the platform C libraries for anything.

> ...
> Naah - I also suggested it because I like the Scheme style of
> conversions, and because I happen to know David Gay personally.  I
> have no opinion about how easy his code is to maintain.

It's not "David Gay" at issue, it's that this code is trying to do an
extremely delicate and exacting task in a language that offers no native
support.  So here's a snippet of the #ifdef maze at the top:

"""
#else /* ifndef IEEE_Arith */
#undef Check_FLT_ROUNDS
#undef Honor_FLT_ROUNDS
#undef SET_INEXACT
#undef  Sudden_Underflow
#define Sudden_Underflow
#ifdef IBM
#undef Flt_Rounds
#define Flt_Rounds 0
#define Exp_shift  24
#define Exp_shift1 24
#define Exp_msk1   0x1000000
#define Exp_msk11  0x1000000
#define Exp_mask  0x7f000000
#define P 14
#define Bias 65
#define Exp_1  0x41000000
#define Exp_11 0x41000000
#define Ebits 8	/* exponent has 7 bits, but 8 is the right value in b2d */
#define Frac_mask  0xffffff
#define Frac_mask1 0xffffff
#define Bletch 4
#define Ten_pmax 22
#define Bndry_mask  0xefffff
#define Bndry_mask1 0xffffff
#define LSB 1
#define Sign_bit 0x80000000
#define Log2P 4
#define Tiny0 0x100000
#define Tiny1 0
#define Quick_max 14
#define Int_max 15
#else /* VAX */
#undef Flt_Rounds
#define Flt_Rounds 1
#define Exp_shift  23
#define Exp_shift1 7
#define Exp_msk1    0x80
#define Exp_msk11   0x800000
#define Exp_mask  0x7f80
#define P 56
#define Bias 129
#define Exp_1  0x40800000
#define Exp_11 0x4080
#define Ebits 8
#define Frac_mask  0x7fffff
#define Frac_mask1 0xffff007f
#define Ten_pmax 24
#define Bletch 2
#define Bndry_mask  0xffff007f
#define Bndry_mask1 0xffff007f
#define LSB 0x10000
#define Sign_bit 0x8000
#define Log2P 1
#define Tiny0 0x80
#define Tiny1 0
#define Quick_max 15
#define Int_max 15
#endif /* IBM, VAX */
#endif /* IEEE_Arith */

#ifndef IEEE_Arith
#define ROUND_BIASED
#endif

#ifdef RND_PRODQUOT
#define rounded_product(a,b) a = rnd_prod(a, b)
#define rounded_quotient(a,b) a = rnd_quot(a, b)
#ifdef KR_headers
extern double rnd_prod(), rnd_quot();
#else
extern double rnd_prod(double, double), rnd_quot(double, double);
#endif
#else
#define rounded_product(a,b) a *= b
#define rounded_quotient(a,b) a /= b
#endif

#define Big0 (Frac_mask1 | Exp_msk1*(DBL_MAX_EXP+Bias-1))
#define Big1 0xffffffff

#ifndef Pack_32
#define Pack_32
#endif

#ifdef KR_headers
#define FFFFFFFF ((((unsigned long)0xffff)<<16)|(unsigned long)0xffff)
#else
#define FFFFFFFF 0xffffffffUL
#endif

#ifdef NO_LONG_LONG
#undef ULLong
#ifdef Just_16
#undef Pack_32
/* When Pack_32 is not defined, we store 16 bits per 32-bit Long.
 * This makes some inner loops simpler and sometimes saves work
 * during multiplications, but it often seems to make things slightly
 * slower.  Hence the default is now to store 32 bits per Long.
 */
#endif
#else	/* long long available */
#ifndef Llong
#define Llong long long
#endif
#ifndef ULLong
#define ULLong unsigned Llong
"""

and a snippet of some #ifdef'ed guts (assuming that it's obvious why Bletch
is #define'd to 2 on some platforms but 4 on others <wink>):

"""
#ifdef Pack_32
	if (k < Ebits) {
		d0 = Exp_1 | y >> Ebits - k;
		w = xa > xa0 ? *--xa : 0;
		d1 = y << (32-Ebits) + k | w >> Ebits - k;
		goto ret_d;
		}
	z = xa > xa0 ? *--xa : 0;
	if (k -= Ebits) {
		d0 = Exp_1 | y << k | z >> 32 - k;
		y = xa > xa0 ? *--xa : 0;
		d1 = z << k | y >> 32 - k;
		}
	else {
		d0 = Exp_1 | y;
		d1 = z;
		}
#else
	if (k < Ebits + 16) {
		z = xa > xa0 ? *--xa : 0;
		d0 = Exp_1 | y << k - Ebits | z >> Ebits + 16 - k;
		w = xa > xa0 ? *--xa : 0;
		y = xa > xa0 ? *--xa : 0;
		d1 = z << k + 16 - Ebits | w << k - Ebits | y >> 16 + Ebits - k;
		goto ret_d;
		}
	z = xa > xa0 ? *--xa : 0;
	w = xa > xa0 ? *--xa : 0;
	k -= Ebits + 16;
	d0 = Exp_1 | y << k + 16 | z << k | w >> 16 - k;
	y = xa > xa0 ? *--xa : 0;
	d1 = w << k + 16 | y << k;
#endif
 ret_d:
#ifdef VAX
	word0(d) = d0 >> 16 | d0 << 16;
	word1(d) = d1 >> 16 | d1 << 16;
#else
#undef d0
#undef d1
#endif
"""

There are over 3,000 lines of code "like that" in dtoa.c alone.  "Obviously
correct" isn't obvious, and some days I think I'd rather track down a bug in
Unicode.

> I completely agree that if you're going to rely on the underlying C
> implementation for floating-point conversions, there's little point in
> trying to do anything really good--C implementations are just too
> variable.

Well, so far as marshal goes (storing floats in code objects), we could and
should stop trying to use decimal strings at all -- Python's 8-byte binary
pickle format for floats is portable and is exact for (finite) IEEE doubles.