Python's garbage collection was Re: Python reliability

Tom Anderson twic at urchin.earth.li
Wed Oct 12 19:38:41 EDT 2005


On Wed, 12 Oct 2005, Jorgen Grahn wrote:

> On Mon, 10 Oct 2005 20:37:03 +0100, Tom Anderson <twic at urchin.earth.li> wrote:
>> On Mon, 10 Oct 2005, it was written:
> ...
>>> There is no way you can avoid making garbage.  Python conses everything,
>>> even integers (small positive ones are cached).
>>
>> So python doesn't use the old SmallTalk 80 SmallInteger hack, or similar?
>
> If the SmallInteger hack is something like this, it does:
>
>>>> a = 42
>>>> b = 42
>>>> a is b
> True
>>>> a = 42000
>>>> b = 42000
>>>> a is b
> False
>>>>
>
> ... which I guess is what if referred to above as "small positive
> ones are cached".

That's not what i meant.

In both smalltalk and python, every single variable contains a reference 
to an object - there isn't the object/primitive distinction you find in 
less advanced languages like java.

Except that in smalltalk, this isn't true: in ST, every variable *appears* 
to contain a reference to an object, but implementations may not actually 
work like that. In particular, SmallTalk 80 (and some earlier smalltalks, 
and all subsequent smalltalks, i think) handles small integers (those that 
fit in wordsize-1 bits) differently: all variables contain a word, whose 
bottom bit is a tag bit; if it's one, the word is a genuine reference, and 
if it's zero, the top bits of the word contain a signed integer. The 
innards of the VM know about this (where it matters), and do the right 
thing. All this means that small (well, smallish - up to a billion!) 
integers can be handled with zero heap space and much reduced instruction 
counts. Of course, it means that references are more expensive, since they 
have to be checked for integerness before dereferencing, but since this is 
a few instructions at most, and since small integers account for a huge 
fraction of the variables in most programs (as loop counters, array 
indices, truth values, etc), this is a net win.

See the section 'Representation of Small Integers' in:

http://users.ipa.net/~dwighth/smalltalk/bluebook/bluebook_chapter26.html#TheObjectMemory26

The precise implementation is sneaky - the tag bit for an integer is zero, 
so in many cases you can do arithmetic directly on the word, with a few 
judicious shifts here and there; the tag bit for a pointer is one, and the 
pointer is stored in two's-complement form *with the bottom bit in the 
same place as the tag bit*, so you can recover a full-length pointer from 
the word by complementing the whole thing, rather than having to shift. 
Since pointers are word-aligned, the bottom bit is always a zero, so in 
the complement it's always a one, so it can also be the status bit!

I think this came from LISP initially (most things do) and was probably 
invented by Guy Steele (most things were).

tom

-- 
That's no moon!



More information about the Python-list mailing list