PEP 327: Decimal Data Type

Sat Jan 31 04:01:41 EST 2004

Josiah Carlson <jcarlson at nospam.uci.edu> wrote in message news:<bvef14$919$1 at news.service.uci.edu>...
> > (In my dreams) I want to "float" to be decimal. Always. No more binary.

I disagree.

My reasons for this have to do with the real-life meaning of figures
with decimal points.  I can say that I have $1.80 in change on my
desk, and I can say that I am 1.80 meters tall.  But the two 1.80's
have fundamentally different meanings.

For money, it means that I have *exactly* $1.80.  This is because
"dollars" are just a notational convention for large numbers of cents.
 I can just as accuately say that have an (integer) 180 cents, and
indeed, that's exactly the way it would be stored in my financial
institution's database.  (I know because I used to work there.)  So
all you really need here is "int".  But I do agree with the idea of
having a class to hide the decimal/integer conversion from the user.

On the other hand, when I say that I am 1.80 m tall, it doesn't imply
that humans height comes in discrete packets of 0.01 m.  It means that
I'm *somewhere* between 1.795 and 1.805 m tall, depending on my
posture and the time of day, and "1.80" is just a convenient
approximation.  And it wouldn't be inaccurate to express my height as
0x1.CC (=1.796875) or (base 12) 1.97 (=1.7986111...) meters, because
these are within the tolerance of the measurement.  So number base
doesn't matter here.

But even if the number base of a measurement doesn't matter, precision
and speed of calculations often does.  And on digital computers,
non-binary arithmetic is inherently imprecise and slow.  Imprecise
because register bits are limited and decimal storage wastes them. 
(For example, representing the integer 999 999 999 requires 36 bits in
BCD but only 30 bits in binary.  Also, for floating point, only binary
allows the precision-gaining "hidden bit" trick.)  Slow because
decimal requires more complex hardware.  (For example, a BCD adder has
more than twice as many gates as a binary adder.)

> In my dreams, data is optimally represented in base e, and every number 
> is represented with a roughly equivalent amount of fudge-factor (except 
> for linear combinations of the powers of e).
> 
> Heh, thankfully my dreams haven't come to fuition.

Perhaps we'll have an efficient inplementation within the next
102.1120... years or so ;-)

> While decimal storage is useful for...money

Out of curiosity: Is there much demand for decimal floating point in
places that have fractionless currecy like Japanese Yen?

> Perhaps a generalized BaseN module is called for.  People 
> could then generate floating point numbers in any base (up to perhaps 
> base 36, [1-9a-z]).

If you're going to allow exact representation of multiples of 1/2,
1/3, 1/4, ..., 1/36, 1/49, 1/64, 1/81, 1/100, 1/121, 1/125, 1/128,
1/144, etc., I see no reason not to have exact representations of
*all* rational numbers.  Especially considering that rationals are
much easier to implement.  (See below.)

> ... Of course then you have the same problem with doing math on two 
> different bases as with doing math on rational numbers.

Actually, the problem is even worse.

Like rationals, BaseN numbers have the problem that there are multiple
representations for the same number (e.g., 1/2=6/12, and 0.1 (2) = 0.6
(12)).  But rationals at least have a standardized normalization.  We
agree can agree that 1/2 should be represented as 1/2 and not
-131/-262, but should BaseN('0.1', base=2) + BaseN('0.1', base=4) be
BaseN('0.11', 2) or BaseN('0.3', 4)?

The same potential problem exists with ints, but Python (and afaik,
everything else) avoids it by internally storing everything in binary
and not keeping track of its representation.  This is why "print 0x68"
produces the same output as "print 104".  BaseN would violate this
separation between numbers and their notation, and imho that would
create a lot more problems than it solves.

Including the problem that mixed-based arithmetic will require:
* approximating at least one of the numbers, in which case there's no
advantage over binary, or
* finding a "least common base", but what if that base is greater than
36 (or 62 if lowercase digits are distinguished from uppercase ones)?