[Python-Dev] [Python-checkins] r64424 - in python/trunk:Include/object.h Lib/test/test_sys.py Misc/NEWSObjects/intobject.c Objects/longobject.c Objects/typeobject.cPython/bltinmodule.c

Fri Jun 27 00:00:23 CEST 2008

Raymond Hettinger wrote:
> From: "Guido van Rossum" <guido at python.org>
>> Let's step back and discuss the API some more.
>>
>> - Do we need all three?
> 
> I think so -- see the the reasons below.

I would prefer 1, see below.

>  Of course, my first choice was 
> not on your list.  To me, the one obvious way to convert a number to a 
> eval-able string in a different base is to use bin(), oct(), or hex().  
> But that appears to be off the table for reasons that I've read but 
> don't make any sense to me.

Let me try.  I am one of those who prefer smaller to bigger for the core 
  language to make it easier to learn and teach.  But, to me, there 
deeper consideration that applies here.  A Python interpreter, human or 
mechanical, must do exact integer arithmetic.  But a Python interpreter 
does not have to convert float literals to fixed size binary and does 
*not* have to do float arithmetic with binary presentations that are 
usually approximations.  (Indeed, human interpreters do neither, which 
is why they are often surprised at CPython's float output, and which is 
why this function will be useful.)  If built-in functions are part of 
the language definition, as Guido just clarified, their definition and 
justification should not depend on the float implementation.

> It seems simple enough, extendable enough, and clean enough
> for bin/oct/hex to use __index__ if present and __float__ if not.

To me, a binary representation, in whatever base, of a Decimal is 
senseless.  The point of this issue is to reveal the exact binary bit 
pattern of float instances.

>> - If so, why not .tobase(N)? (Even if N is restricted to 2, 8 and 16.)
> 
> I don't think it's user-friendly to have the float-to-bin API
> fail to parallel the int-to-bin API.  IMO, it should be done
> the same way in both places.

I would like to turn this around.  I think that 3 nearly identical 
built-ins is 2 too many. I am going to propose on the Py3 list that bin, 
oct, and hex be condensed to one function, bin(integer, base=2,8,or16), 
for 3.1 if not 3.0.  Base 8 and 16 are, to me, compressed binary.

Three methods is definitely too many for a somewhat subsidiary function. 
  So, I would like to see float.bin([base=2])

> I don't find it attractive in appearance.  Any use case I can
> imagine involves multiple calls using the same base and I would likely 
> end-up using functools.partial or somesuch
> to factor-out the repeated use of the same variable. 

Make the base that naive users want to see the default.  I believe this 
to be 2.  Numerical analysts who want base 16 can deal with partial if 
they really have scattered calls (as opposes to a few within loops) and 
cannot deal with typing '16' over and over.

>>>> bin(.6)
> '0b10011001100110011001100110011001100110011001100110011 * 2.0**-53'
...
> Both of those bits of analysis become awkward with the tobase() method:
>>>> (.6).tobase(2)

Eliminate the unneeded parentheses and default value, and this is
 >>> .6.bin()
which is just one extra char.

>> - What should the output format be? I know you originally favored
>> 0b10101.010101 etc. Now that it's not overloaded on the bin/oct/hex
>> builtins, the constraint that it needs to be an eval() able expression
>> may be dropped (unless you see a use case for that too).
> 
> The other guys convinced me that round tripping was important
> and that there is a good use case for being able to read/write
> precisely specified floats in a platform independent manner.

Definitely.  The paper I referenced in the issue discussion,
http://bugs.python.org/issue3008   mentioned a few times here, is
http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf

> Also, my original idea didn't scale well without exponential
> notation -- i.e.  bin(125E-100) would have a heckofa lot
> of leading zeroes.   Terry and Mark also pointed-out that
> the hex with exponential notation was the normal notation
> used in papers on floating point arithmetic.  Lastly, once I
> changed over to the new way, it dramatically simplified the
> implementation.

I originally thought I preferred the 'hexponential' notation that uses P 
for power instead of E for exponential.  But with multiple bases, the 
redundancy of repeating the bases is ok, and being able to eval() 
without changing the parser is a plus.  But I would prefer losing the 
spaces around the ** operator.

Terry Jan Reedy