[Tutor] range function and floats?

Sat Jan 8 06:35:51 CET 2011

Wayne Werner wrote:
> On Wed, Jan 5, 2011 at 4:59 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> Wayne Werner wrote:
>>
>>> <snip>
>> I never said rounding errors - I said "pesky floating point errors". When
>> Which ARE rounding errors. They're *all* rounding errors, caused by the
>> same fundamental issue --  the impossibility of representing some specific
>> exact number in the finite number of bits, or digits, available.
>>
>> Only the specific numbers change, not the existence of the errors.
> 
> 
> So truncation == rounding. I can agree with that, though they've always
> seemed distinct entities before, because you can round up or round down, but
> truncation simply removes what you don't want, which is equivalent to
> rounding down at whatever precision you want.

Well, technically truncation is a special case of rounding: round 
towards zero. When you round, you are throwing away information: the 
number you have might have (say) 20 digits of precision, and you only 
need, or want, or can take (say) 18 digits. (Or bits, for binary 
numbers, or whatever base you are using. There were some early Russian 
computers that used base three, many early Western machines used base 
12, etc.) So you have to throw away two digits. How you throw them away 
is up to you. There are five basic types of rounding:

1 round towards positive infinity (take the ceiling);
2 round towards negative infinity (take the floor);
3 round towards zero (truncate);
4 round away from zero (like ceil for +ve numbers and floor for -ve);
5 round towards the nearest integer

Number five is interesting, because numbers of the form N.5 are exactly 
half-way between two integers, and so you have to choose a strategy for 
breaking ties:

5a always round up (what you probably learned in school);
5b always round down;
5c round towards zero;
5d round away from zero;
5e round up if the result will be even, otherwise down;
5f round up if the result will be odd, otherwise down;
5g round up or down at random;
5h alternate between rounding up and rounding down;

5a introduce a small bias in the result: assuming the numbers you round 
are randomly distributed, you will tend to increase them more often than 
decrease them. 5b is the same, only reversed.

5c and 5d are overall symmetrical, but they introduce a bias in positive 
numbers, and an equal bur reversed bias in negative numbers.

5e and 5f are symmetrical, as is 5g provided the random number generator 
is fair. Likewise for 5h. Provided the numbers you deal with are 
unbiased, they won't introduce any bias.

5e is also interesting. It is sometimes called "statistician's 
rounding", but more often "banker's rounding" even though there is no 
evidence that it was ever used by bankers until the advent of modern 
computers.

The bias involved from a poor choice of rounding can be significant. In 
1982, the Vancouver Stock Exchange started a new index in 1982, with an 
initial value of 1000.000. After 22 months it had fallen to 
approximately 520 points, during a period that most stock prices were 
increasing. It turned out that the index was calculated by always 
rounding down to three decimal places, thousands of times each day. The 
correct value of the index should have been just under 1100. The 
accumulated rounding error from over half a million calculations in 22 
months was enough to introduce rounding error of nearly 580 points -- a 
relative error of just over 50%.

> Having re-read and thought about it for a while, I think my argument simply
> distills down to this: using Decimal both allows you control over your
> significant figures, 

 From Python, Decimal gives you more control over precision and rounding 
than binary floats. If you're programming in a low-level language that 
gives you better access to the floating point routines, binary floats 
give you almost as much control. The only difference I'm aware of is 
that the Decimal module lets you choose any arbitrary number of 
significant digits, while low-level floats only have a choice of certain 
fixed number of bits. The IEEE 754 standard mandates half precision (16 
bits), single (32 bits), double (64 bits, or what Python uses for 
floats) and quadruple (128 bits). Not all of those bits are available 
for precision, one bit is used for sign and some are used for the 
exponent. E.g. doubles have 53 bits of precision (except for 
denormalised numbers, which have fewer).

> and (at least for me) *requires* you to think about
> what sort of truncation/rounding you will experience, and let's be honest -
> usually the source of errors is we, the programmers, not thinking enough
> about precision - and the result of this thought process is usually the
> elimination, not of truncation/rounding, but of not accounting for these
> errors. Which, to me, equates to "eliminating those pesky floating point
> errors".

You can't eliminate rounding errors unless you have effectively infinite 
precision, which even at the cheap prices of RAM these days, would be 
quite costly :)

But what you can do is *control* how much rounding error you get. This 
is not as easy as it might seem though... one problem is the so-called 
"Table-maker's Dilemma" (table as in a table of numbers): in general, 
there is no way of knowing how many extra digits you need to calculate 
in order to correctly round a mathematical function.

-- 
Steven