conversion priority for Python (Thoughts on PEP 238, 240 & 242)

Thu Aug 15 18:48:25 EDT 2002

I have been doing some thinking recently on the changes to py2.2 and
have to say first off I'm glad Guido is merging types and especially
for PEP 238!

BUT, reading some of the PEPs and the arguments pro and against all my
thoughts really do relate to all three PEPs above even though you'd
probably have prefered me to join an existing thread rather than start
this new one.

I think the crux of my argument can be summarized in 1 word:
Precision.  That is to say, there seems to be a general assumption
(stated explicitly for instance in PEP 242) that float is "more
precise" than any int type.  Taking 242 into account that float need
not necairily be 1:12:51 as specified by IEEE FP notation.  For
simplicity however, even if we assume float has more than 52-bits of
precision, the pyLong clearly is built with more robustness in mind.

Thus, one could argue that many, if not all, long numbers are in fact
MORE precise than a corresponding fixed floating point representation.
 Granted, PEP 242 would create platform-dependant solutions to the
problems with the C-Double on 32-bit arch, but it still limits things
to pre-defined types if I am reading it correctly.

It seems PEP 240 is going for a different aproach: the rational.  The
cannonical way of writing rational with LpyLong precision was to
simply multiply everything by the denominator, but of course that is
hard to remember and keep track of, and therefore does leave room for
improvement.

Now, as I stated before, I like PEP 238!  BUT, there is one think I
was discussing with a friend yesterday that I found a bit annoying:

>>> 2 / 1
2.0

As he explained, the logic is likely a desire to return a CONSISTANT
type in division and if you WANTED an int result, you use //.  I agree
that:

>>> 1/2
0.5

Is the right way to go, but if the result is an integer too, do we
need to convert it to a float in the result.  Granted, under the hood,
division is coercing the two ints to floats and dividing to avoid a
"is divisible" pre-check (via modulo?) but what happens when you use
longs...

>>> rbn
# A Really Big Number (pyLong)
>>> long(rbn**2 / rbn)
# First 16 or so digits of rbn are correct, rest are noise

Theoritically, I would expect the result of the above calculation to
be rbn.  Yes, I could use // but what if I didn't know the expression
divides evenly?  Generally speaking, if rbn is a long represented by
at least n bits, then its precision is +/- 2E-n.  If n is less than
52, the C-float for instance will represent it precisely.  However, if
n is greater than the mantissa (bits of precision) of the native
floating point representation, there will be loss of precision
compared to what the author may have intended.

Take for example a function choose:

>>> def choose(n, k):
    # Calculates the statistical (n k) or "n choose k"
    result = 1
    for i in range(k):
        result = result * (n-i) / (i + 1)
    return result

The idea here is to write a "faster choose" function than something
that would multiple out all the n(n-1)(n-2)...(n-k) terms in the
numerator before divising by k! in the denominator.  However, this
solution produces intermediate decimals in which is good in that we
don't loose the iterative precision when (n-i) is not divisible by
(i+1).  OTOH, as n grows, there is a rapid potential to see it grow
past 52-bit precision inherent in the C-Float..  Also, because one
would ASSUME the choose function would return a long since the choose
function only operates on integers, one has to do an explicit cast at
the end.  This though seems unavoidable because in this case I WANT
the to-decimal coersion.  But again, precision...

So in principle a rational class is a good idea, BUT values like "pi"
would still be C-float and certainly any irrational number could not
be represented any better by a rational.

OTOH, the real.py library supports arbitrary precision reals.  This is
a great idea IMHO but the precision is defined in the module header
IIRC

Now, I am aware and have used real.py and think its a great library! 
But it does have one problem IMHO which still has to do with
precision.  Yes, you can configure it to any precision you want but it
can't "figure out" precision from the context.  The precision is fixed
at the module level IIRC and set through a property of the module
real.py.  It is certainly true that unlike Long which will never
extend infinitly to the left of the decimal -- otherwise it WOULD be
infinity! -- the rational decimals can to the right, and such infinite
digits would be impractical in most situations.

So I like that there is a configurable precision in real.py.  And
ideally, longs should NEVER be converted to the less-precise C-Float,
and real.py makes a great alternative.  But the fixed-precision still
can lead to loss of information from the long to the real if the
precision is less than the number of digits in the long.

I therefore propose the following for your consideration:

Continuing with the real.py concept and the IEEE s:o:m where s is the
sign bit, o is the ordinate, a signed exponent of 2 and m is the
mantissa such that x = s:o:m = s * 1.m * 2**o.  Then, I would suggest
that rather than fixing the number of bits in o and m, could we not
allow contextual precision??  That is, if pyLong MUST be converted to
a float, if that long be of n-digits, could we not assign it to a
float such that m >= n-1 and o >= log_2(n) + 1?  These would not
expand infinitely, but rather would take their cues from the long from
which you were converting expand or contract as necessary.  They could
expand in multiplication, for instance, but maybe they would at most
stay the same for division since rational numbers would still suffer
from loss of precision.  OTOH, a division might result in a number of
leading zeros in "o" or trailing zeros in "m" in which case a
contraction would be justified.  real.py could probably be modified to
do this without too much work having done a cursory exam, though it
would have to be all C-code and integrated into the implicit coercion
rules which would be tricky.

Anyway, that's my 2c on these issues.  Otherwise, I'm strongly pro238,
on the fence for 240 and apethetic towards 242 as it likely wont
effect me on a Mac, Windows 2000 or x86 Linux as those are the only
platforms I'm using these days (no need for installing it on the Sun).
 So, and flames^h^h^h^h^h^h^h thoughts?

Jeffrey.