Maths error

Wed Jan 10 22:44:54 EST 2007

[Tim Peters]
...
>> Huh.  I don't read it that way.  If it said "numbers can be ..." I 
>> might, but reading that way seems to requires effort to overlook the 
>> "decimal" in "decimal numbers can be ...".

[Nick Maclaren]
> I wouldn't expect YOU to read it that way,

Of course I meant "putting myself in others' shoes, I don't ...".

> but I can assure you from experience that many people do.

Sure.  Possibly even most.  Short of writing a long & gentle tutorial, 
can that be improved?  Alas, most people wouldn't read that either <0.5 
wink>.

> What it MEANS is "Numbers with a short representation in decimal

"short" is a red herring here:  Python's Decimal constructor ignores the 
precision setting, retaining all the digits you give.  For example, if 
you pass a string with a million decimal digits, you'll end up with a 
very fat Decimal instance -- no info is lost.

> can be represented exactly in decimal arithmetic", which is
> tautologous.  What they READ it to mean is "One advantage of
> representing numbers in decimal is that they can be represented
> exactly", and they then assume that also applies to pi, sqrt(2),
> 1/3 ....
>
> The point is that the "decimal" could apply equally well to the
> external or internal representation and, if you aren't fairly
> clued-up in this area, it is easy to choose the wrong one.

Worse, I expect most people have no real idea of that there's a possible 
difference between internal and external representations.  This is often 
given as a selling point for decimal arithmetic:  it's WYSIWYG in ways 
binary fp can't be (short of inventing power-of-2 fp representations for 
I/O, which few people would use).

[attribution lost]
>>>>> and how is decimal no better than binary?

[Tim]
>>>> Basically, they both lose info when rounding does occur.  For
>>>> example, 

[Nick]
>>> Yes, but there are two ways in which binary is superior.  Let's
>>> skip the superior 'smoothness', as being too arcane an issue for
>>> this group,

>> With 28 decimal digits used by default, few apps would care about
>> this anyway.

> Were you in the computer arithmetic area during the "base wars" of the
> 1960s and 1970s that culminated with binary winning out?

Yes, although I came in on the tail end of that and never actually used 
a non-binary machine.

> A lot of very well-respected numerical analysts said that larger bases
> led to a faster build-up of error (independent of the precision).  My
> limited investigations indicated that there was SOME truth in that,
> but it wasn't a major matter; I never say the matter settled
> definitively.

My point was that 28 decimal digits of precision is far greater than 
supplied even by 64-bit binary floats today (let alone the smaller sizes 
in most-common use back in the 60s and 70s).  "Pollution" of low-order 
bits is far less of a real concern when there are some number of low-
order bits you don't care about at all.

>>> and deal with the other.  In binary, calculating the mid-point
>>> of two numbers (a very common operation) is guaranteed to be within
>>> the range defined by those numbers, or to over/under-flow.
>>>
>>> Neither (x+y)/2.0 nor (x/2.0+y/2.0) are necessarily within the range
>>> (x,y) in decimal, even for the most respectable values of x and y.
>>> This was a MAJOR "gotcha" in the days before binary became standard,
>>> and will clearly return with decimal.

>> I view this as being an instance of "lose info when rounding does 
>> occur".  For example,

> No, absolutely NOT!

Of course it is.  If there were no rounding errors, the computed result 
would be exactly right -- that's darned near tautological too.  You 
snipped the examples I gave showing exactly where and how rounding error 
created the problems in (x+y)/2 and x/2+y/2 for some specific values of 
x and y using decimal arithmetic.  If you don't like those examples, 
supply your own, and if you get a similarly surprising result you'll 
find rounding error(s) occur(s) in yours too.

It so happens that rounding errors in binary fp can't lead to the same 
counterintuitive /outcome/, essentially because x+x == y+y implies x == 
y in base 2 fp, which is indeed a bit of magic specific to base 2.  The 
fact that there /do/ exist fp x and y such that x != y yet x+x == y+y in 
bases > 2 is entirely due to fp rounding error losing info.

> This is an orthogonal matter,

Disagree.

> and is about the loss of an important invariant when using any base
> above 2.

It is that.

> Back in the days when there were multiple bases, virtually every
> programmer who wrote large numerical code got caught by it at least
> once, and many got caught several times (it has multiple guises).
> For example, take the following algorithm for binary chop:
>
> while 1 :
>     c = (a+b)/2
>     if f(x) < y :
>         if c == b :
>             break
>         b = c
>     else :
>         if c == a :
>             break
>         a = c
>
> That works in binary, but in no base above 2 (assuming that I haven't
> made a stupid error writing it down).  In THAT case, it is easy to fix
> for decimal, but there are ways that it can show up that can be quite
> tricky to fix.

If you know a < b, doing

    c = a + (b-a)/2

instead of

    c = (a+b)/2

at least guarantees (ignoring possible overflow) a <= c <= b.  As shown 
last time, it's not even always the case that (x+x)/2 == x in decimal fp 
(or in any fp base > 2, for that matter).