[Python-ideas] Python Numbers as Human Concept Decimal System

Sat Mar 8 08:53:40 CET 2014

From: Steven D'Aprano <steve at pearwood.info>

Sent: Friday, March 7, 2014 10:36 PM

> On Fri, Mar 07, 2014 at 09:33:38PM -0800, Andrew Barnert wrote:
> 
>>  Today, given any real, Decimal(float('real')) always gives you the 
>>  value of the closest binary float to that real. With this change, it 
>>  will sometimes give you the value of the second-closest 
>>  repr-of-a-binary-float to that real.
> 
> Please don't talk about "reals". "Real" has a technical 
> meaning in 
> mathematics, and no computer number system is able to represent the 
> reals. In Python, we have floats, which are C doubles, and Decimals.

I am using "real" in the technical sense. That's the whole crux of the problem: you want to use the real[1] number 0.1 in Python, and you can't, because there is no IEEE double that matches that value. If we ignore reals and only talk about floats, then 0.1 and 0.100000000000000012 are the same number, so there is no problem.

    [1]: I suppose we could talk about rationals instead of reals here, because (a) you can't enter irrationals as literals, and (b) (b) Guido's proposal obviously doesn't attempt to help with them. I don't think that makes a difference here, but I could be wrong; if I am, please point out below where this leads me astray.

Put another way: This cannot be just about converting IEEE double floats to Decimal floats, because Python already does that perfectly. It's about recovering information lost in representing real numbers (or, if you prefer, rational numbers, or Python literals) as IEEE double floats, when converting them to Decimal floats, by taking advantage of the fact that (a) we know that humans tend to use numbers like 0.1 more often than numbers like 0.100000000000000005551, and (b) we know the precision of IEEE double and can guarantee that we're not losing any real information.

> Putting that aside, I'm afraid I don't understand how Guido's 
> suggestion 
> to use repr() in the Decimal constructor will sometimes give the 
> second-closest value. Can you explain in more detail? If you can show an 
> example, that would be great too.

OK, take the number 0.100000000000000012.

The closest IEEE double to this number is 0.1000000000000000055511151231257827021181583404541015625. The next closest is 0.10000000000000001942890293094023945741355419158935546875. Today's design gives you the former when you write Decimal(0.100000000000000012). You get the closest one in this case, and in every case.

The closest repr of an IEEE double to this number is 0.10000000000000002. But with Guido's proposal, the one you get is 0.1, which is farther from the number you wanted.

So, if you treat evaluate-literal-as-float-then-Decimal.from_float as a (mathematical) function, it has the property that it always gives you the closest value from its range to your input. If you treat evaluate-literal-as-float-then-repr-then-Decimal as a function, it does not have that property.

>>  This means the error across any 

>>  range of reals increases. It's still below the rule-of-thumb cutoff 
>>  that everyone uses for converting through floats, but it is higher by 
>>  a nonzero amount that doesn't cancel out.

Was this second point clear? In case it wasn't, let me expand on it:

In the example above, Guido's proposal gives you nearly double the error (1.2e-17 vs. 6.4+e-18) of the current design. Of course there are also cases where it gives a lower error, 0.1 being an obvious example. I'm pretty sure (but I don't know how to prove…) that if you integrate the absolute error over any sufficiently large[1] set of reals, it's higher in Guido's proposal than in the current design.

    [1]: The "sufficiently large" there is just so you don't choose a set so small that all of its elements map to a single IEEE double, like { 0.1 <= x < 0.100000000000000001 }.

>>  Similarly, today, the distribution of float values across the real 

>>  number line is... not uniform (because of exponents), and I don't know 
>>  the right technical term, you know what it looks like. The 
>>  distribution of repr-of-float variables is not.
> 
> Nope, sorry, I don't get you.
> 
> I think that what you are trying to say is that for each binary 
> exponent, the floats with that same exponent are distributed uniformly:
> 
> |...|...|...|...|...|...|...|...|
> 
> but their reprs are not:
> 
> |...|..|..|....|...|....|..|....|
> 
> Is this what you mean?

That's part of it. But on top of that, I think (but again can't prove) that the reprs are on average skewed further to the left than the right, which means that, e.g., the average of a collection of reprs-of-floats will also be skewed in that direction.

So, those are my three points. Remember that I don't know if any of them actually matter to anyone, and in fact am hoping they do not.