[Python-Dev] stackable ints [stupid idea (ignore) :v]

Sat Jun 12 08:19:37 CEST 1999

[Aaron, describes a scheme where objects are represented by a fixed-size
 (typecode, variant)
 pair, where if the typecode is e.g. INT or FLOAT the variant is the
 value directly instead of a pointer to the value]

[Guido]
> What you're describing is very close to what I recall I once read
> about the runtime organization of Icon.

At the lowest level it's exactly what Icon does.  It does *not* exempt ints
from Icon's flavor of dynamic memory management, but Icon doesn't use
refcounting -- it uses compacting mark-&-sweep across some 5 distinct
regions each with their own finer-grained policies (e.g., strings are
central to Icon and so it manages the string region a little differently;
and Icon coroutines save away pieces of the platform's C stack so need
*very* special treatment).

So:

1) There are no incref/decref expenses anywhere in Icon.

2) Because of compaction, all allocations cost the same and are dirt cheap:
just increment the appropriate region's "avail" pointer by the number of
bytes you need.  If there aren't enough bytes, run GC and try again.  If
there still aren't enough bytes, Icon usually shuts down (it's not good at
asking the OS for more memory!  it carves up its initial memory in pretty
rigid ways, and relies on tricks like comparing storage addresses to speed
M&S and compaction -- those "regions" are in a fixed order relative to each
other, so new memory can't be tacked on to a region except at the low and
high ends).

3) All the expense is in finding and compacting live objects, so in an odd
literal sense cleaning up trash comes for free.

4) Icon has no finalizers, so it doesn't need to identify or preserve
trash -- compaction simply overwrites "the holes" where the trash used to
be.

Icon is nicely implemented, but it's a "self-contained universe" view of the
world and its memory approach makes life hard for the tiny handful of folks
who have *tried* to make it extendable via C.  Icon is also purely
procedural -- no OO, no destructors, no resurrection.

Irony:  one reason I picked up Python in '91 is that my int-fiddling code
was too slow in Icon!  Even Python 0.9.0 ran int algorithms significantly
faster than the 10-years-refined Icon implementation of that time.  Never
looked into why, but now that Aaron brought up the issue I find it very
surprising!  Those algorithms had a huge rate of int trash creation, but
very few persistent objects, so Icon's M&S should have run like the wind.
And Icon's allocation is dirt-cheap (at least as fast as Python's fastest
special-purpose allocators), and didn't have any refcounting expenses
either.

There's an important lesson *somewhere* in that <wink>.  Maybe it was the
fault of Icon's "goal-directed" expression evaluation, constantly asking
"did this int succeed or fail?", "did that add suceed or fail?", etc.

> ...
> The Icon approach (i.e. yours) seems to require a complete rethinking
> of all object implementations and all APIs at the C level -- perhaps
> we could think about it for Python 2.0.  Some ramifications:
>
> - Uses more memory for highly shared objects (there are as many copies
> of the type pointer as there are references).

Actually more than that in Icon:  if the "variant" part is a pointer, the
first word of the block it points to is also a copy of the typecode (turns
out the redundancy speeds the GC).

> - Thus, lists take double the memory assuming they reference objects
> that also exist elsewhere.  This affects the performance of slices
> etc.
>
> - On the other hand, a list of ints takes half the memory (given that
> most of those ints are not shared).

Isn't this 2/3 rather than 1/2?  I'm picturing a list element today as
essentially a pointer to a type object pointer + int (3 units in all), and a
type object pointer + int (2 units in all) "tomorrow".  Throw in refcounts
too and the ratio likely gets closer to 1.

> - *Homogeneous* lists (where all elements have the same type --
> i.e. arrays) can be represented more efficiently by having only one
> copy of the type pointer.  This was an idea for ABC (whose type system
> required all container types to be homogenous) that was never
> implemented (because in practice the type check wasn't always applied,
> and the top-level namespace used by the interactive command
> interpreter violated all the rules).

Well, Python already has homogeneous int lists (array.array), and while they
save space they suffer in speed due to needing to wrap raw ints "in an
object" upon reference and unwrap them upon storage.

> - Reference count manipulations could be done by a macro (or C++
> behind-the-scense magic using copy constructors and destructors) that
> calls a function in the type object -- i.e. each object could decide
> on its own reference counting implementation :-)

You don't need to switch representations to get that, though, right?  That
is, I don't see anything stopping today's type objects from growing
__incref__ and __decref__ slots -- except for common sense <wink>.

An apparent ramification I don't see above that may actually be worth
something <wink>:

- In "i = j + k", the eval stack could contain the ints directly, instead of
pointers to the ints.  So fetching the value of i takes two loads (get the
type pointer + the variant) from adjacent stack locations, instead of
today's load-the-pointer + follow-the-pointer (to some other part of
memory); similarly for fetching the value of j.  Then the sum can be stored
*directly* into the stack too, without today's need for allocating and
wrapping it in "an int object" first.

Possibly happy variant:  on top of the above, *don't* exempt ints from
refcounting.  Let 'em incref and decref like everything else.  Give them an
intial refcount of max_count/2, and in the exceedingly unlikely event a
decref on an int ever sees zero, the int "destructor" simply resets the
refcount to max_count/2 and is otherwise a nop.

semi-thinking-semi-aloud-ly y'rs  - tim