doing tricks with refcounts

Terry Reedy tjreedy at home.com
Tue Sep 25 14:58:32 EDT 2001


Note: I have crossposted this response describing refcounts in Python
to comp.lang.python in case there are any further comments, questions,
or corrections relevant to this subtopic.  Please delete for other
disjoint subtopics.

"Richard B. Kreckel" <Richard.Kreckel at GiNaC.DE> wrote in message
news:9onqf2$d6h$1 at bambi.zdv.Uni-Mainz.DE...
> Hi,
>
> I am currently experiencing with an idea I had a while ago and need
> advice about what we nowadays call "prior art".
>
> Consider a reference-counted system, where an object of class A
> handles objects of class B by usual reference counting.  I.e., class
B
> is equipped with a reference-count variable and objects of class A
> bump it up when they are copied or created and down when they are
> destroyed and when the associated B's reference-count drops to zero,
> the object of class A just deletes it.

Python (www.python.org, comp.lang.python) is an increasingly popular
object language that uses reference counting.  The implementors are
cognizant of the waste involved in duplicate copies of *immutable*
objects (those whose value cannot be changed).  Their approach is to
avoid unneeded duplicates rather than to try to fuse after the fact.

A.  Assignment (name-binding) copies references, not objects (or
values).  After
b = a
b refers to the same object as a.  If that object is immutable, this
is always the right thing to do.  If that object is mutable, and one
wants to change 'a' and 'b' independently, then it is not since one
then needs two (independent) objects.  To get such, one must
explicitly ask that 'b' be bound to a copy of (the object labelled)
'a'.  For lists,
b = a[:]
works.  For other mutable objects, there is a copy module.

B. As an optimization, the current interpreter preallocates 100 int
objects (which are immutable) with values -1 to 99.  For most
programs, most int object creation requests are efficiently handled by
returning a reference to one of these preallocated objects.  So
a = 39; b = 39
binds both 'a' and 'b' to the preallocated int with value 39, while
a = 139; b = 139
currently binds them to duplicate new int objects.

C. As a newer optimization, certain strings (also immutable in Python)
are 'interned' so that subsequent duplicate creation requests return a
reference to the same string object.  I believe that at least some
string comparisons begin by checking object identity.  I do not know
the details, but I am sure that there was some tuning to maximize
efficiency gain - overhead loss.

> Now assume this:
> - frequently one has two objects of class B that are equivalent.
> - there is a common operation f(a1,a2) on two objects of class A
that
>   leaves their B's invariant.
>
> When executing f(a1,a2), after a deep comparison of the
corresponding
> b1 and b2 has found them to be equal, it can look for
reference-counts
> that are equal to one.  Assume it finds b1's refcount to be one.  f
> can then delete b1 and link a1's reference to b2, increasing b2's
> reference-count.

In Python, the process of making a comparison may temporarily bump the
reference count up by 1.  Calling sys.refcount(object) definitely
does, so one has to be careful.

> If none of b1's and b2's reference-count is equal to 1, we can
imagine
> a more general scheme where the lower reference-count is further
> decreased in favor of the higher reference-count.
>
> It is obvious, that this will save some memory.  It may also save
> computing time.  The reason why it may save time is this: If
f(a1,a2)
> is a comparison function, a repeated call may not have to do a deep
> compare.  Instead, it just has to compare pointers.  How much is
> actually saved is hard to determine.  It depends on how frequent
> equivalent Bs are and how often f(a1,a2) is called.  Both parameters
> might be quite hard to determine.

All of which is why Python does not multiply objects except as
requested.

> Am I talking obvious and well-known and rather ancient stuff?  If
you
> think so, I would welcome a comment and a reference to the
literature
> or to an actual implementation.  Also, what could be the name of
this
> scheme?  Terms like "depletion" and "fusion" come into mind, but
they
> are just the result of desperate seach for proper terms...

When you define a  class in Python, you can optionally add a __cmp__
method to handle comparisons.  You could experiment with your fusion
idea within such user-defined classes.

 > Richard B. Kreckel
> <Richard.Kreckel at GiNaC.DE>
> <http://wwwthep.physik.uni-mainz.de/~kreckel/>

Terry J. Reedy





More information about the Python-list mailing list