[Tutor] object size in python is in what units?

Tue Jul 23 16:38:20 CEST 2013

On 23/07/13 18:17, Jim Mooney wrote:
> On 23 July 2013 00:40, Steven D'Aprano <steve at pearwood.info> wrote:
>
>>
>> No no no, a thousand times no!!! IDs are just numeric IDs, that is all,
>> like your social security number or driver's licence number. Don't think of
>> them as having any meaning at all, except that they are unique during the
>> lifetime of the object.
>>
>
> Okay, ID stands for a location fixed until the object disappears, but we
> don't know where that location is.

No, it does not stand for a location. It is an identification number, that is all.

In CPython, every value (lists, strings, integers, custom classes) are objects which exist in one location until destroyed. In Jython and IronPython, values are objects which can move from location to location, in order to free up memory for new objects. In PyPy, not only can objects move, but they can also be invisibly turned into low-level data structures for efficient processing. So in CPython, Jython and IronPython, an object's identity is fixed. In PyPy, object identity is an illusion, and according to the PyPy developers, keeping that illusion working requires a lot of effort.

> But what about object offsets from self?

That's always zero, since self is the object :-)

You can read up about the implementation of CPython here:

http://docs.python.org/2/c-api/

but this describes only the C interface, not the concrete implementation. For that you pretty much have to read the source code. The C source code.

> Is the beginning of self.spam always the same distance from the beginning
> of self.eggs? Or can I just forget the hard ground of assembler-metaphors
> entirely as I float off into abstractville? I guess the fixed lengths I
> kept getting on re-runs were coincidental but not to be relied on.

Understanding a little of the concrete implementation details will help you understand some of Python's design, and its limitations. But apart from that, it's not necessary, and sometimes it is actually counter-productive. Thinking in terms of low-level data can lead you wrong. For instance, when working in low-level languages, it is normal to write code to minimize moving data from place to place. For example, when sorting in C or assembly, comparing two values is cheap, but swapping them is expensive. In Python it is the opposite: swapping two values in a list just requires swapping two pointers, which is cheap, but the comparison is quite expensive and could call arbitrary user code.

In CPython, objects are implemented in C, and are a block of memory that contains various fields, and potentially pointers to other objects. The nature of those fields will depend on which type of object they are.

In Jython, objects are implemented in Java, and in IronPython they are implemented in C#, but otherwise are similar. But the layout of the fields will be different, and the fields themselves, except perhaps by accident.

In PyPy, objects are implemented in RPython, a restricted and simplified version of Python, which is then automatically translated by the optimizing compiler into some low-level data structure behind the scenes.

Since Python is defined by its behaviour, not any sort of concrete implementation, in principle one could invent a "Python Engine", like Babbage's Difference Engine, only thousands of times more complex, entirely out of clockwork. Or by water flowing through pipes, or reproducing DNA in a test tube, or some exotic quantum field. Or simply simulate a Python interpreter in your head, perhaps aided by pencil and paper.

-- 
Steven