[Python-ideas] Python Float Update
Steven D'Aprano
steve at pearwood.info
Mon Jun 1 16:58:06 CEST 2015
On Mon, Jun 01, 2015 at 06:27:57AM +0000, Nicholas Chammas wrote:
> Having decimal literals or something similar by default, though perhaps
> problematic from a backwards compatibility standpoint, is a) user friendly,
> b) easily understandable, and c) not surprising to beginners. None of these
> qualities apply to float literals.
I wish this myth about Decimals would die, because it isn't true. The
only advantage of base-10 floats over base-2 floats -- and I'll admit it
can be a big advantage -- is that many of the numbers we commonly care
about can be represented in Decimal exactly, but not as base-2 floats.
In every other way, Decimals are no more user friendly, understandable,
or unsurprising than floats. Decimals violate all the same rules of
arithmetic that floats do. This should not come as a surprise, since
decimals *are* floats, they merely use base 10 rather than base 2.
In the past, I've found that people are very resistant to this fact, so
I'm going to show a few examples of how Decimals violate the fundamental
laws of mathematics just as floats do. For those who already know this,
please forgive me belabouring the obvious.
In mathematics, adding anything other than zero to a number must give
you a different number. Decimals violate that expectation just as
readily as binary floats:
py> from decimal import Decimal as D
py> x = D(10)**30
py> x == x + 100 # should be False
True
Apart from zero, multiplying a number by its inverse should always give
one. Again, violated by decimals:
py> one_third = 1/D(3)
py> 3*one_third == 1
False
Inverting a number twice should give the original number back:
py> 1/(1/D(7)) == 7
False
Here's a violation of the Associativity Law, which states that (a+b)+c
should equal a+(b+c) for any values a, b, c:
py> a = D(1)/17
py> b = D(5)/7
py> c = D(12)/13
py> (a + b) + c == a + (b+c)
False
(For the record, it only took me two attempts, and a total of about 30
seconds, to find that example, so it's not particularly difficult to
come across such violations.)
Here's a violation of the Distributive Law, which states that a*(b+c)
should equal a*b + a*c:
py> a = D(15)/2
py> b = D(15)/8
py> c = D(1)/14
py> a*(b+c) == a*b + a*c
False
(I'll admit that was a bit trickier to find.)
This one is a bit subtle, and to make it easier to see what is going on
I will reduce the number of digits used. When you take the average of
two numbers x and y, mathematically the average must fall *between* x
and y. With base-2 floats, we can't guarantee that the average will be
strictly between x and y, but we can be sure that it will be either
between the two values, or equal to one of them.
But base-10 Decimal floats cannot even guarantee that. Sometimes the
calculated average falls completely outside of the inputs.
py> from decimal import getcontext
py> getcontext().prec = 3
py> x = D('0.516')
py> y = D('0.518')
py> (x+y)/2 # should be 0.517
Decimal('0.515')
This one is even worse:
py> getcontext().prec = 1
py> x = D('51.6')
py> y = D('51.8')
py> (x+y)/2 # should be 51.7
Decimal('5E+1')
Instead of the correct answer of 51.7, Decimal calculates the answer as
50 exactly.
> I always assumed that float literals were mostly an artifact of history or
> of some performance limitations. Free of those, why would a language choose
> them over decimal literals?
Performance and accuracy will always be better for binary floats. Binary
floats are faster, and have stronger error bounds and slower-growing
errors. Decimal floats suffer from the same problems as binary floats,
only more so, and are slower to boot.
> When does someone ever expect floating-point
> madness, unless they are doing something that is almost certainly not
> common, or unless they have been burned in the past?
> Every day another programmer gets bitten by floating point stupidities like
> this one <http://stackoverflow.com/q/588004/877069>. It would be a big win
> to kill this lame “programmer rite of passage” and give people numbers that
> work more like how they learned them in school.
There's a lot wrong with that.
- The sorts of errors we see with floats are not "madness", but the
completely logical consequences of what happens when you try to do
arithmetic in anything less than the full mathematical abstraction.
- And they aren't rare either -- they're incredibly common. Fortunately,
most of the time they don't matter, or aren't obvious, or both.
- Decimals don't behave like the numbers you learn in school either.
Floats are not real numbers, regardless of which base you use. And in
fact, the smaller the base, the smaller the errors. Binary floats are
better than decimals in this regard.
(Decimals *only* win out due to human bias: we don't care too much that
1/7 cannot be expressed exactly as a float using *either* binary or
decimal, but we do care about 1/10. And we conveniently ignore the case
of 1/3, because familiarity breeds contempt.)
- Being at least vaguely aware of floating point issues shouldn't be
difficult for anyone who has used a pocket calculator. And yet every day
brings in another programmer surprised by floats.
- It's not really a rite of passage, that implies that it is arbitrary
and imposed culturally. Float issues aren't arbitrary, they are baked
into the very nature of the universe.
You cannot hope to perform infinitely precise real-number arithmetic
using just a finite number of bits of storage, no matter what system you
use. Fixed-point maths has its own problems, as does rational maths.
All you can do is choose to shift the errors from some calculations to
other calculations, you cannot eliminate them altogether.
--
Steve
More information about the Python-ideas
mailing list