[Python-ideas] Python Float Update

Mon Jun 1 16:58:06 CEST 2015

On Mon, Jun 01, 2015 at 06:27:57AM +0000, Nicholas Chammas wrote:

> Having decimal literals or something similar by default, though perhaps
> problematic from a backwards compatibility standpoint, is a) user friendly,
> b) easily understandable, and c) not surprising to beginners. None of these
> qualities apply to float literals.

I wish this myth about Decimals would die, because it isn't true. The 
only advantage of base-10 floats over base-2 floats -- and I'll admit it 
can be a big advantage -- is that many of the numbers we commonly care 
about can be represented in Decimal exactly, but not as base-2 floats. 
In every other way, Decimals are no more user friendly, understandable, 
or unsurprising than floats. Decimals violate all the same rules of 
arithmetic that floats do. This should not come as a surprise, since 
decimals *are* floats, they merely use base 10 rather than base 2.

In the past, I've found that people are very resistant to this fact, so 
I'm going to show a few examples of how Decimals violate the fundamental 
laws of mathematics just as floats do. For those who already know this, 
please forgive me belabouring the obvious.

In mathematics, adding anything other than zero to a number must give 
you a different number. Decimals violate that expectation just as 
readily as binary floats:

py> from decimal import Decimal as D
py> x = D(10)**30
py> x == x + 100  # should be False
True

Apart from zero, multiplying a number by its inverse should always give 
one. Again, violated by decimals:

py> one_third = 1/D(3)
py> 3*one_third == 1
False

Inverting a number twice should give the original number back:

py> 1/(1/D(7)) == 7
False

Here's a violation of the Associativity Law, which states that (a+b)+c 
should equal a+(b+c) for any values a, b, c:

py> a = D(1)/17
py> b = D(5)/7
py> c = D(12)/13
py> (a + b) + c == a + (b+c)
False

(For the record, it only took me two attempts, and a total of about 30 
seconds, to find that example, so it's not particularly difficult to 
come across such violations.)

Here's a violation of the Distributive Law, which states that a*(b+c) 
should equal a*b + a*c:

py> a = D(15)/2
py> b = D(15)/8
py> c = D(1)/14
py> a*(b+c) == a*b + a*c
False

(I'll admit that was a bit trickier to find.)

This one is a bit subtle, and to make it easier to see what is going on 
I will reduce the number of digits used. When you take the average of 
two numbers x and y, mathematically the average must fall *between* x 
and y. With base-2 floats, we can't guarantee that the average will be 
strictly between x and y, but we can be sure that it will be either 
between the two values, or equal to one of them. 

But base-10 Decimal floats cannot even guarantee that. Sometimes the 
calculated average falls completely outside of the inputs.

py> from decimal import getcontext
py> getcontext().prec = 3
py> x = D('0.516')
py> y = D('0.518')
py> (x+y)/2  # should be 0.517
Decimal('0.515')

This one is even worse:

py> getcontext().prec = 1
py> x = D('51.6')
py> y = D('51.8')
py> (x+y)/2  # should be 51.7
Decimal('5E+1')

Instead of the correct answer of 51.7, Decimal calculates the answer as 
50 exactly.

> I always assumed that float literals were mostly an artifact of history or
> of some performance limitations. Free of those, why would a language choose
> them over decimal literals? 

Performance and accuracy will always be better for binary floats. Binary 
floats are faster, and have stronger error bounds and slower-growing 
errors. Decimal floats suffer from the same problems as binary floats, 
only more so, and are slower to boot.

> When does someone ever expect floating-point
> madness, unless they are doing something that is almost certainly not
> common, or unless they have been burned in the past?
> Every day another programmer gets bitten by floating point stupidities like
> this one <http://stackoverflow.com/q/588004/877069>. It would be a big win
> to kill this lame “programmer rite of passage” and give people numbers that
> work more like how they learned them in school.

There's a lot wrong with that.

- The sorts of errors we see with floats are not "madness", but the 
completely logical consequences of what happens when you try to do 
arithmetic in anything less than the full mathematical abstraction.

- And they aren't rare either -- they're incredibly common. Fortunately, 
most of the time they don't matter, or aren't obvious, or both.

- Decimals don't behave like the numbers you learn in school either. 
Floats are not real numbers, regardless of which base you use. And in 
fact, the smaller the base, the smaller the errors. Binary floats are 
better than decimals in this regard.

(Decimals *only* win out due to human bias: we don't care too much that 
1/7 cannot be expressed exactly as a float using *either* binary or 
decimal, but we do care about 1/10. And we conveniently ignore the case 
of 1/3, because familiarity breeds contempt.)

- Being at least vaguely aware of floating point issues shouldn't be 
difficult for anyone who has used a pocket calculator. And yet every day 
brings in another programmer surprised by floats.

- It's not really a rite of passage, that implies that it is arbitrary 
and imposed culturally. Float issues aren't arbitrary, they are baked 
into the very nature of the universe.

You cannot hope to perform infinitely precise real-number arithmetic 
using just a finite number of bits of storage, no matter what system you 
use. Fixed-point maths has its own problems, as does rational maths.

All you can do is choose to shift the errors from some calculations to 
other calculations, you cannot eliminate them altogether. 

-- 
Steve