[Python-Dev] Unifying Long Integers and Integers

Guido van Rossum guido@digicool.com
Sun, 11 Mar 2001 18:25:13 -0500


(I'm splitting this in separate replies per PEP, to focus the
discussion a bit.)

> Trying once again for the sought after position of "most PEPs on the
> planet", here are 3 new PEPs as discussed on the DevDay. These PEPs
> are in a large way, taking apart the existing PEP-0228, which served
> its strawman (or pie-in-the-sky) purpose well.
> 
> Note that according to PEP 0001, the discussion now should be focused
> on whether these should be official PEPs, not whether these are to
> be accepted. If we decide that these PEPs are good enough to be PEPs
> Barry should check them in, fix the internal references between them.

Actually, since you have SF checkin permissions, Barry can just give
you a PEP number and you can check it in yourself!

> I would also appreciate setting a non-Yahoo list (either SF or
> python.org) to discuss those issues -- I'd rather discussion will be
> there rather then in my mailbox -- I had bad experience regarding
> that with PEP-0228.

Please help yourself.  I recommend using SF since it requires less
overhead for the poor python.org sysadmins.

> (See Barry? "send a draft" isn't that scary. Bet you don't like me
> to tell other people about it, huh?)

What was that about?

> PEP: XXX
> Title: Unifying Long Integers and Integers
> Version: $Revision$
> Author: pep@zadka.site.co.il (Moshe Zadka)
> Status: Draft
> Python-Version: 2.2
> Type: Standards Track
> Created: 11-Mar-2001
> Post-History:
> 
> 
> Abstract
> 
>     Python has both integers, machine word size integral types, and
>     long integers, unbounded integral types. When integers
>     operations overflow, the machine registers, they raise an
>     error. This proposes to do away with the distinction, and unify
>     the types from the prespective of both the Python interpreter,
>     and the C API.
> 
> Rationale
> 
>     Having the machine word size leak to the language hinders
>     portability (for examples, .pyc's are not portable because of
>     that). Many programs find a need to deal with larger numbers
>     after the fact, and changing the algorithms later is not only
>     bothersome, but hinders performance on the normal case.

I'm not sure if the portability of .pyc's is much worse than that of
.py files.  As long as you don't use plain ints >= 2**31 both are 100%
portable.  *programs* can of course become non-portable, but the true
reason for the change is simply that the distinction is arbitrary and
irrelevant.

> Literals
> 
>     A trailing 'L' at the end of an integer literal will stop having
>     any meaning, and will be eventually phased out. This will be
>     done using warnings when encountering such literals. The warning
>     will be off by default in Python 2.2, on by default for two
>     revisions, and then will no longer be supported.

Please suggested a more explicit schedule for introduction, with
approximate dates.  You can assume there will be roughly one 2.x
release every 6 months.

> Builtin Functions
> 
>     The function long will call the function int, issuing a
>     warning. The warning will be off in 2.2, and on for two
>     revisions before removing the function. A FAQ will be added that
>     if there are old modules needing this then
> 
>          long=int
> 
>     At the top would solve this, or
> 
>          import __builtin__
>          __builtin__.long=int
> 
>     In site.py.

There's more to it than that.  What about sys.maxint?  What should it
be set to?  We've got to pick *some* value because there's old code
that uses it.  (An additional problem here is that it's not easy to
issue warnings for using a particular constant.)

Other areas where we need to decide what to do: there are a few
operations that treat plain ints as unsigned: hex() and oct(), and the
format operators "%u", "%o" and "%x".  These have different semantics
for bignums!  (There they ignore the request for unsignedness and
return a signed representation anyway.)

There may be more -- the PEP should strive to eventually list all
issues, although of course it neededn't be complete at first checkin.

> C API
> 
>     All PyLong_AsX will call PyInt_AsX. If PyInt_AsX does not exist,
>     it will be added. Similarly PyLong_FromX. A similar path of
>     warnings as for the Python builtins followed.

May C APIs for other datatypes currently take int or long arguments,
e.g. list indexing and slicing.  I suppose these could stay the same,
or should we provide ways to use longer integers from C as well?

Also, what will you do about PyInt_AS_LONG()?  If PyInt_Check()
returns true for bignums, C code that uses PyInt_Check() and then
assumes that PyInt_AS_LONG() will return a valid outcome is in for a
big surprise!  I'm afraid that we will need to think through the
compatibility strategy for C code more.

> Overflows
> 
>     When an arithmetic operation on two numbers whose internal
>     representation is as a machine-level integers returns something
>     whose internal representation is a bignum, a warning which is
>     turned off by default will be issued. This is only a debugging
>     aid, and has no guaranteed semantics.

Note that the implementation suggested below implies that the overflow
boundary is at a different value than currently -- you take one bit
away from the long.  For backwards compatibility I think that may be
bad...

> Implementation
> 
>     The PyInt type's slot for a C long will be turned into a 
> 
>            union {
>                long i;
>                digit digits[1];
>            };

Almost.  The current bignum implementation actually has a length field
first.

I have an alternative implementation in mind where the type field is
actually different for machine ints and bignums.  Then the existing
int representation can stay, and we lose no bits.  This may have other
implications though, since uses of type(x) == type(1) will be broken.
Once the type/class unification is complete, this could be solved by
making long a subtype of int.

>     Only the n-1 lower bits of the long have any meaning, the top
>     bit is always set. This distinguishes the union. All PyInt
>     functions will check this bit before deciding which types of
>     operations to use.

See above. :-(

> Jython Issues
> 
>     Jython will have a PyInt interface which is implemented by both
>     from PyFixNum and PyBigNum.
> 
> 
> Copyright
> 
>     This document has been placed in the public domain.
> 
> 
> 
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> End:

All in all, a good start, but needs some work, Moshe!

--Guido van Rossum (home page: http://www.python.org/~guido/)