[Python-Dev] FW: 64-bit port of Python

Wed, 9 Feb 2000 16:49:17 -0000

Let me define two different Python interpreter systems to help the
discussion.

- System A: Compiled with MSVC on a 64-bit Intel chip (i.e. LLP64 data
model, long is 32-bits).
- System B: Compiled with gcc on a 64-bit Intel chip (i.e. LP64 data model,
long is 64-bits).

Same hardware. Just different compiler (and possibly different OS).

First a couple of responses:

[Greg Stein]:
> In any case where Python needs to cast a pointer back/forth with an
> "integer", there are two new routines in Python 1.5.2. From longobject.h:
>
> extern DL_IMPORT(PyObject *) PyLong_FromVoidPtr Py_PROTO((void *));
> extern DL_IMPORT(void *) PyLong_AsVoidPtr Py_PROTO((PyObject *));
>
> I supplied the patch for these while I was also adding the 'P' format code
> for the "struct" module.
>
> The functions return a PyIntObject or a PyLongObject depending on whether
> the size of void pointer matches the size of a C long value. If a pointer
> fits in a long, you get an Integer. Otherwise, you get a Long.
>
> > > "Python/bltinmodule.c::899":
> > >
> > >   static PyObject *
> > >   builtin_id(self, args)
> > > 	PyObject *self;
> > > 	PyObject *args;
> > >   {
> > > 	PyObject *v;
> > >
> > > 	if (!PyArg_ParseTuple(args, "O:id", &v))
> > > 		return NULL;
> > > 	return PyInt_FromLong((long)v);
> > >   }
> >
> Assuming that we can say that id() is allowed to return a PyLongObject,
> then this should just use PyLong_FromVoidPtr. On most platforms, it will
> still return an Integer. For Win64 (and some other platforms), it will
> return a Long.

This means that my System A and System B (above) get different resultant
object types for id() just because the compiler used for their Python
interpreter uses a different data model. That sounds dangerous. Are there
pickling portability issues or external interface issues? I know that noone
should really need to be passing converted pointer results between
platforms, but... shouldn't two Python interpreters running on identical
hardware behave identically. This seems to me the only (or safest) way to
guarantee portability.

[Trent Mick]:
> > > If so, then the representation of the Python integer type will have
> > > to change (i.e. the use of 'long' cannot be relied upon). One should
> > > then carry through and change (or obselete) the *_AsLong(),
> > > *_FromLong() Python/C API functions to becomesomething like
> > > AsLargestNativeInt(), *_FromLargestNativeInt()  (or some
> > > less bulky name).
> > >
> > > Alternatively, if the python integer type continues to use the C
> > > 'long' type for 64-bit systems then the following ugly thing
> > > happens:
> > >  - A Python integer on a 64-bit Intel chip compiled with MSVC is
> > >    32-bits wide.
> > >  - A Python integer on a 64-bit Intel chip compiled with gcc is
> > >    64-bits wide.
> > > That cannot be good.
[Greg Stein]:
> The problem already solved (it's so much fun to borrow Guido's time
> machine!). Some of the C code just needs to catch up and use the new
> functionality, though.

How so? Do you mean with PyLong_{As|From}VoidPtr()?

I want to make a couple of suggestions about this 64-bit compatibility
stuff. I will probably sound like I am on glue but please bear with me and
let me try and convince you that I am not.

          *                 *               *

PyInt was tied to C's 'long' based on the (reasonable) assumption that this
would represent the largest native integral type on which Python was running
(or, at least, I think that PyInt *should* be the largest native int,
otherwise it is arbitrarily limited). Hence, things like holding a pointer
and printing its value came for free. However, with the LLP64 data model
(that Microsoft has assumed for WIN64) this intention is bastardized:
sizeof(long)==4 and size(void*)==8.

Taking it as a given that Python should be made to run on the various 64-bit
platforms, there are two ways to deal with this:
1. Continue to base PyInt on 'long' and bolt on things like LONG_LONG,
PyLong_FromVoidPtr with return types of either PyInt or PyLong as necessary;
or
2. Add a level of typedef abstraction to decouple Python from the
no-longer-really-valid wish that 'long' is that largest native integer and
couple PyInt with the actual largest native integral value.
3. (I know I said there were only two. Spanish Iqui...:) Andrew Kuchling has
this all under control (as Tim intimated might be the case) or I am really
missing something.

I would like to argue for option number 2. C programmers use the various
integral types for different reasons. Simple uses: Use 'int' when you just
want a typical integer. Use 'long' when you need the range. Use 'short' when
you know the range is limited to 64k and you need to save space. More
specific: Use 'long' to store a pointer or cast to 'long' to print the
decimal value of the pointer with printf(). These uses all make assumptions
that can bite you when the data model (i.e. type sizes) changes.

What is needed is a level of abstraction away from the fundamental C types.
ANSI has defined some of this already (but maybe not enough). If you want to
store a pointer, use 'intptr_t' (or 'uintptr_t'). If you know the range is
limited 64k, then use 'int16_t'. If you want the largest native integral
type, use something like 'intlongest_t'. If you know that range is limited
to 64k, but you don't take the time hit for sign extension that 'int16_t'
may imply, then use 'int16fast_t'. 'int16fast_t' and its kin (the ugly name
is mine) would be guaranteed to be at least as wide as the name implies
(i.e. 16-bits wide here), but could be larger if that would be faster for
the current system. It is these meanings that I think C programmers are
really trying to express when they use short, and int, and long.

On the Python/C API side, use things like:
 - PyInt would be tied to intlongest_t
 - extern DL_IMPORT(PyObject *) PyInt_FromLongest Py_PROTO((intlongest_t));

"What?!," you say. "Trent, are you nuts? Why not just use 'int' then instead
of this ugly 'int16fast_t'?"  Well, just using 'int' carries the implicit
assumption that 'int' is at least 16-bits wide. I know that it *is* for any
reasonable system that Python is going to run on but: (1) the explicit
specification of the range is self documenting as to the intentions of the
author; and (2) the same argument applies to int*fast_t of other sizes where
the size assumption about 'int' may not be so cut-and-dry.

This opens up a can of worms. Your first impression is to raise your hands
and say that everything from printf formatters, to libc functions, to
external libraries, to PyArg_Parse() and Py_BuildValue() is based upon the
fundamental C types. Hence, it is not possible to slip in a level of data
type abstraction. I suppose I could be proven wrong, but I think it is
possible. The printf formatters can be manhandled to use the formatter you
want. The libc functions, on quick perusal, painfully try to do something
like what I am suggesting anyway so they map fairly well. PyArg_Parse(), etc
*could* be changed if that was necessary (*now* I *know* Guido thinks I am
nuts).

                *              *               *

This, I think, is the idea for general data model portability. However,
because (1) it would require a lot of little patches and (2) it may require
some backward incompatibilities, I realize that it would never be considered
until at least Python 2.0.

If you are skeptical because it sounds like I am just talking and asking for
a volunteer to make these changes, it might help to know that I am
volunteering to work on this. (Yes, Tim. ActiveState *is* paying me to look
at this stuff.) I just want to see what the general reaction is to this: You
are going about this in the wrong way? Go for it? Yes, but...?

> or-if-activestate-solves-this-for-perl-first-we'll-just-rewrite-
>    python-in-that<wink>-ly y'rs  - tim
not-on-your-life-ly y'rs - Trent

Trent
trentm@ActiveState.com