[Python-ideas] Python Numbers as Human Concept Decimal System

Sun Mar 9 20:34:12 CET 2014

I would dearly like to put this thread to rest, as it has strayed mightily
from the topic of improvements to Python, and all points of view have been
amply defended. I'm hoping to hear from Cowlishaw, but I expect he'll side
with one of Mark Dickinson's proposals. I hope that somebody will take up
the task to write a PEP about introducing a decimal literal; that sounds
like an obtainable goal, less controversial than changing Decimal(<float>).

I did do some more thinking about how the magic repr() affects the
distribution of values, and came up with an example of sorts that might
show what it does. We've mostly focused on simple like 1.1, but to
understand the distribution issue it's better to look at a very large value.

I took 2**49 as an example and added a random fraction. When printed this
always gives a single digit past the decimal point, e.g. 562949953421312.5.
Then I measured the distribution of the last digit. What I found matched my
prediction: the digits 0, 1, 2, 4, 5, 6, 8, 9 occurred with roughly equal
probability (1/8th). So 3 and 7 are completely missing.

The explanation is simple enough: using the (current) Decimal class it's
easy to see that there are only 8 possible actual values, whose fractional
part is a multiple of 1/8. IOW the exact values end in .000, .125, .250,
.375, .500, .625, .750, .875. (*) The conclusion is that there are only 3
bits represented after the binary point, and repr() produces a single digit
here, because that's all that's needed to correctly round back to the 8
possible values. So it picks the digit closest to each of the possible
values, and when there are two possibilities it picks one. I don't know how
it picks, but it is reproducible -- in this example it always chooses .2 to
represent .250, and .8 to represent .750. The exact same thing happens
later in the decimal expansion for smaller numbers.

I think that the main objection to this distribution is that, while the
exact distribution is evenly spaced (the possibilities are 1/8th away from
each other), the rounded distribution has some gaps of 1/10th and some of
1/5th. I am not enough of a statistician to know whether this matters (the
distribution of *digits* in the exact values is arguably less randomized
:-) but at least this example clarified my understanding of the phenomenon
we're talking about when we discuss distribution of values.

________
(*) I was momentarily startled to find that the set of Decimals produced
contained 9 items, until I realized that some random() call must have
produced a value close enough to 1 to be rounded up.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140309/fb4ac37c/attachment.html>