PEP 327: Decimal Data Type

Tue Feb 3 20:59:41 EST 2004

On Mon, 02 Feb 2004 17:07:52 -0500, cookedm+news at physics.mcmaster.ca
(David M. Cooke) wrote:

>At some point, "Batista, Facundo" <FBatista at uniFON.com.ar> wrote:
>
>> danb_83 wrote: 
>>
>> #- On the other hand, when I say that I am 1.80 m tall, it doesn't imply
>> #- that humans height comes in discrete packets of 0.01 m.  It 
>> #- means that
>> #- I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
>> #- posture and the time of day, and "1.80" is just a convenient
>> #- approximation.  And it wouldn't be inaccurate to express my height as
>> #- 0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
>> #- these are within the tolerance of the measurement.  So number base
>> #- doesn't matter here.
>>
>> Are you saying that it's ok to store your number imprecisely because you
>> don't take well measures?
>
>What we need for this is an interval type. 1.80 m shouldn't be stored
>as '1.80', but as '1.80 +/- 0.005', and operations such as addition
>and multiplication should propogate the intervals.

I disagree with this, not because it is a bad idea to keep track of
precision, but because this should not be a part of the float type or
of basic arithmetic operations.

When you write a value with its precision specified in the form of an
interval, that interval is a second number. The value with the
precision is a compound representation, built up using simpler
components. It doesn't mean that the components no longer have uses
outside of the compound. In Python, the same should apply - a numeric
type that can track precision sounds useful, but it shouldn't replace
the existing float.

One good reason is simply that knowledge of the precision is only
sometimes useful. As an obvious example, what would the point be of
keeping track of the precision of the calculations in a 3D game -
there is no point as the information about precision has no bearing on
the rendering of the image.

Besides this, there is a much more fundamental problem.

The whole point of using an imprecise representation is because
manipulating a perfect representation is impractical - mainly slow.

It is true that in general the source is inherently approximate too,
meaning that floats are a quite a good match for the physical
measurements they are often used to represent, but still if it were
practical to do perfect arithmetic on those approximate values it
would give slightly more precise answers as the arithmetic would not
introduce additional sources of error.

Having an approximate representation with an interval sounds good, but
remember that one error source is the arithmetic itself - e.g. 1.0 /
3.0 cannot be finitely represented in either binary or decimal without
error (except as a rational, of course).

So therefore, in answer to your question...

>How to do that is another question: for addition, do you add the
>magnitudes of the intervals, or use the square root of the sums of the
>squares, or something else? It greatly depends on what _type_ of error
>0.005 measures (is it the width of a Gaussian distribution? a uniform
>distribution? something skewed that's not representable by one
>number?).

None of these is sufficient - they may track the errors resulting from
measurement issues (if you choose the appropriate method for your
application) but neither takes into account errors resulting from the
imprecision of the arithmetic. Furthermore, to keep track of such
imprecision precisely means you need an infinitely precise numeric
representation for your interval - and if it was practical to do that,
it would be far better to just use that representation for the value
itself.

This doesn't mean that tracking precision is a bad idea. It just means
that when it is done, the error interval itself should be imprecise.
You should have the guarantee that the real value is never going to be
outside of the given bounds, but not the guarantee that the bounds are
as close together as possible - the bounds should be allowed to get a
little further apart to allow for imprecision in the calculation of
the interval.

And if the error interval is itself an approximation, why track it on
every single arithmetic operation? Unless you have a specific good
reason to do so, it makes much more sense to handle the precision
tracking at a higher level. And as those higher level operations are
often going to be application specific, having a single library for it
(ie not tailored to some particular type of task) is IMO unlikely to
work.

For instance, consider calculating and applying a 3D rotation matrix
to a vector. If you track errors on every float value, that is 9
values in the matrix with error values (due to limited precision trig
functions etc) and 3 values in the vector, a dozen for the
intermediate results in the matrix multiplication, and 3 error
intervals for the 3 dimensions of the output vector. But the odds are
that all you want is a single float value - the maximum distance
between the real point and the point represented by the output vector,
and you can probably get a good value for that by multiplying the
length of the input vector by some 'potential error from rotation'
constant.

Incidentally, it would not always be appropriate to include arithmetic
errors in error intervals. For instance,  some statistical interval
types do not guarantee that all values are within the interval range.
They may guarantee that 95% of values are within the interval, for
instance - _and_ that 5% of values are outside the interval. The 5%
outside is as important as the 95% inside, so there is no acceptable
direction to move the bounds a little 'just to be safe'.

In some cases, you might even want to track the error interval (from
arithmetic error) for your error interval value. I can certainly
imagine a result with the form...

  The average widginess of a blodgit is 9.5 +/- 0.2
  95% differ from the average by less than 2.7 +/- 0.03

  Thus I can say that this randomly chosen blodgit has a
  widginess of (9.5 +/- 0.2) +/- (2.7 +/- 0.03) with 95% confidence.

You might even get results like that it you had estimated the average
and distribution of widginess from a sample of the blodgits - in which
case, you may still need to account from the arithmetic error which
requires potentially another four values ;-)

-- 
Steve Horne

steve at ninereeds dot fsnet dot co dot uk