PEP 327: Decimal Data Type

Mon Feb 9 12:47:34 EST 2004

On Fri, 06 Feb 2004 20:25:21 +0100, anton at vredegoor.doge.nl (Anton Vredegoor) wrote:

>On 6 Feb 2004 17:03:57 GMT, bokr at oz.net (Bengt Richter) wrote:
>
>>The _meaning_ of numbers that are guaranteed to fall into known exact intervals
>>in terms of representing measurements, measurement errors, statistics of the
>>errors, etc. is a separate matter from keeping track of exact intervals during
>>computation. These concerns should not be confused, IMO, though they inevitably
>>arise together in thinking about computing with real-life measurement values.
>
>(Warning, naive hobbyist input, practicality: undefined)
>
>One possible option would be to provide for some kind of random
>rounding routine for some of the least significant bits of a floating
>point value. The advantage would be that this would also be usable for
>DSP-like computations that are used in music programming (volume
>adjustments) or in digital video (image rotation). 
>
I can't spend a lot of time on this right now, but this reminds me of
a time when I tried (sucessfully IMO) to explain why feeding a simulation
system with very low noise data got more accurate results than feeding it
exact data.

The reason has to do with quantization (which was part of the system being
simulated, and which could be fed with highly accurate world-sim values plus
noise). I.e., measurements are always represented digitally with some least
significat bit representing some defined amount of a measured quantity.
This means measurement information below that is lost (or at least one bit
below that, depending the device).

The result is that a statistical mean (or other integrating process) of samples
will not be affected by the bits lost in quantizing. In the case of feeding a
simulator with accurate values multiple times, this results in the identical
biased quantized values, whereas if you add a small amount of noise, you will
get a few neighboring quantized values in some proportion, and the mean will
be a better estimate of the true (unquantized) value that a mean of quantized
values with no noise -- where all the quantized values are exactly equal and
all biased. The effect can be amplified if the input is feeding a sensitive
calculation such as the inversion of a near-singular matrix, and can make the
difference between usable and useless results.

An example using int as the quantization function:

 >>> import random
 >>> def simval(val, noise=1.0):
 ...     return val + noise*random.random()
 ...
 >>> def simulator(val, noise, trials=1000):
 ...     return sum([int(simval(val, noise)) for i in xrange(trials)])/float(trials)
 ...
 >>> for i in xrange(10): print simulator(1.3, 0.0),
 ...
 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 >>> for i in xrange(10): print simulator(1.3, 1.0),
 ...
 1.295 1.293 1.284 1.307 1.3 1.292 1.322 1.291 1.322 1.315

I suspect that the ear integrates/averages some when presented with 44.1k samples/sec,
so if uniform noise is added in below the quantization lsb of a CD, that may enhance
the perceived output sound, but some audiophile can provide the straight scoop on that.

>I agree with the idea that exact interval tracking is important, but
>perhaps this exact interval tracking should be used only during
>testing and development of the code. 
>
>It could be that it would be possible to produce code with a fixed
>number of least significant bits that are randomly rounded each time
>some specific operation makes this necessary (not *all* computations!)
>and that the floating point data would stay accurate enough for long
>enough to be useable in 99.9 percent of the use cases.
>
I think you have to be careful when you do your rounding, and note
the effect on values vs populations of values and how that feeds the
next stage of processing or use.

>Maybe we need a DSP-float instead of a decimal data type? Decimals
>could be used for testing DSP-float implementations.
>
I'm not sure what DSP-float really means yet ;-)
HTH, gotta go.

Regards,
Bengt Richter