[Tutor] range function and floats?
Steven D'Aprano
steve at pearwood.info
Sat Jan 8 06:35:51 CET 2011
Wayne Werner wrote:
> On Wed, Jan 5, 2011 at 4:59 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>
>> Wayne Werner wrote:
>>
>>> <snip>
>> I never said rounding errors - I said "pesky floating point errors". When
>> Which ARE rounding errors. They're *all* rounding errors, caused by the
>> same fundamental issue -- the impossibility of representing some specific
>> exact number in the finite number of bits, or digits, available.
>>
>> Only the specific numbers change, not the existence of the errors.
>
>
> So truncation == rounding. I can agree with that, though they've always
> seemed distinct entities before, because you can round up or round down, but
> truncation simply removes what you don't want, which is equivalent to
> rounding down at whatever precision you want.
Well, technically truncation is a special case of rounding: round
towards zero. When you round, you are throwing away information: the
number you have might have (say) 20 digits of precision, and you only
need, or want, or can take (say) 18 digits. (Or bits, for binary
numbers, or whatever base you are using. There were some early Russian
computers that used base three, many early Western machines used base
12, etc.) So you have to throw away two digits. How you throw them away
is up to you. There are five basic types of rounding:
1 round towards positive infinity (take the ceiling);
2 round towards negative infinity (take the floor);
3 round towards zero (truncate);
4 round away from zero (like ceil for +ve numbers and floor for -ve);
5 round towards the nearest integer
Number five is interesting, because numbers of the form N.5 are exactly
half-way between two integers, and so you have to choose a strategy for
breaking ties:
5a always round up (what you probably learned in school);
5b always round down;
5c round towards zero;
5d round away from zero;
5e round up if the result will be even, otherwise down;
5f round up if the result will be odd, otherwise down;
5g round up or down at random;
5h alternate between rounding up and rounding down;
5a introduce a small bias in the result: assuming the numbers you round
are randomly distributed, you will tend to increase them more often than
decrease them. 5b is the same, only reversed.
5c and 5d are overall symmetrical, but they introduce a bias in positive
numbers, and an equal bur reversed bias in negative numbers.
5e and 5f are symmetrical, as is 5g provided the random number generator
is fair. Likewise for 5h. Provided the numbers you deal with are
unbiased, they won't introduce any bias.
5e is also interesting. It is sometimes called "statistician's
rounding", but more often "banker's rounding" even though there is no
evidence that it was ever used by bankers until the advent of modern
computers.
The bias involved from a poor choice of rounding can be significant. In
1982, the Vancouver Stock Exchange started a new index in 1982, with an
initial value of 1000.000. After 22 months it had fallen to
approximately 520 points, during a period that most stock prices were
increasing. It turned out that the index was calculated by always
rounding down to three decimal places, thousands of times each day. The
correct value of the index should have been just under 1100. The
accumulated rounding error from over half a million calculations in 22
months was enough to introduce rounding error of nearly 580 points -- a
relative error of just over 50%.
> Having re-read and thought about it for a while, I think my argument simply
> distills down to this: using Decimal both allows you control over your
> significant figures,
From Python, Decimal gives you more control over precision and rounding
than binary floats. If you're programming in a low-level language that
gives you better access to the floating point routines, binary floats
give you almost as much control. The only difference I'm aware of is
that the Decimal module lets you choose any arbitrary number of
significant digits, while low-level floats only have a choice of certain
fixed number of bits. The IEEE 754 standard mandates half precision (16
bits), single (32 bits), double (64 bits, or what Python uses for
floats) and quadruple (128 bits). Not all of those bits are available
for precision, one bit is used for sign and some are used for the
exponent. E.g. doubles have 53 bits of precision (except for
denormalised numbers, which have fewer).
> and (at least for me) *requires* you to think about
> what sort of truncation/rounding you will experience, and let's be honest -
> usually the source of errors is we, the programmers, not thinking enough
> about precision - and the result of this thought process is usually the
> elimination, not of truncation/rounding, but of not accounting for these
> errors. Which, to me, equates to "eliminating those pesky floating point
> errors".
You can't eliminate rounding errors unless you have effectively infinite
precision, which even at the cheap prices of RAM these days, would be
quite costly :)
But what you can do is *control* how much rounding error you get. This
is not as easy as it might seem though... one problem is the so-called
"Table-maker's Dilemma" (table as in a table of numbers): in general,
there is no way of knowing how many extra digits you need to calculate
in order to correctly round a mathematical function.
--
Steven
More information about the Tutor
mailing list