Python vs Ruby

Tim Peters tim.one at home.com
Tue Jan 30 01:16:31 EST 2001


[Neil Schemenauer]
> RC also works quite well with data caches and VM managers.  Using
> the Boehm-Demers GC with Python and disabling reference counting
> slows the interpreter down by a significant amount.

[D-Man]
> Really?  This is interesting.  I had thought that it would be faster.
> At the very least, the simple assignment statment can be simple.  With
> RC, both arguments' ref counts have to be checked first and then
> updated appropriately.

So consider this:

>>> from random import choice
>>> addresses = {}
>>> letters = "abcdefghijklmnopqrstuvwxyz"
>>> for i in xrange(100000):
...     s = choice(letters) + " " + choice(letters)
...     addresses[id(s)] = 1
...
>>> len(addresses)
9
>>>

Thanks to refcounting, the storage for "s" is reclaimed on each iteration,
and overwhelmingly most iterations manage to reuse the storage left behind
from a preceding iteration.  As Neil said, that's very cache-friendly.
Without refcounting, you're likely to stomp on new memory addresses each
iteration until some limit is reached, then gc will go lurching all over a
large blob of memory at least once (possibly more) cleaning it up.  That's
cache-hostile.

Python has extremely high dynamic memory throughput, because *nothing* is
exempted from the "everything's an object, and all objects are
heap-allocated" scheme.  Mark-&-sweep-- and especially simple M&S --is
happier with modest rates of memory turnover.

> ...
> All the tests I've seen of the Boehm collector used C programs
> originally designed for malloc/free, not using any ref counting schemes.
> Maybe this has an effect?

No C program I know of does dynamic allocation on virtually every line.
Python programs do.  C++ is somewhere between them, but closer to the C end.

> I wonder if python was designed for the collector instead of ref
> counting if this would improve its performance . . .

There's no sense in which Python was designed "for" ref counting.  The
single highest-payoff thing it could do here, regardless of garbage
collection scheme, would be to grow a lot of hair to *exempt* some objects
from participating in the gc scheme du jour.  For example, in a language
with more sanely static semantics, it would be easy to see in the example
above that there's no need to allocate an integer object for the result of
id(s) each time through the loop -- it could just store that result on a
stack (or in a register) and exempt it entirely from refcounting (or M&S, or
whatever).  Similarly for the loop index "i", and the function object
"choice".

BTW, note that Perl also does refcounting, and for the same reasons.  The
idea that fancier gc schemes must be faster is ... an idea <0.9 wink>.

if-anything-about-gc-seems-obvious-it's-an-illusion-ly y'rs  - tim





More information about the Python-list mailing list