[Python-Dev] gc ideas -- dynamic profiling

Dima Tisnek dimaqq at gmail.com
Sat Dec 4 00:13:34 CET 2010


Python organizes objects into 3 generations, ephemeral, short- and long-lived.

When object is created it is place in ephemeral, if it lives long
enough, it is move to short-lived and so on.

q1 are generations placed in separate memory regions, or are all
generations in one memory regions and there is a pointer that
signifies the boundary between generations?

I propose to track hot spots in python, that is contexts where most of
allocations occur, and instrument these with counters that essentially
tell how often an object generated here ends up killed in ephemeral,
short-, long-lived garbage collector run or is in fac tstill alive. If
a particular allocation context creates objects that are likely to be
long-lived, allocator could skip frst 2 generations altogether
(generations are separate regions) or preload the object with high
survival count (if q1 is single region).

On the other hand, if we know where most allocations occur, we can
presume that most of these allocations are ephemeral, otherwise we run
out of memory anyway, if this is indeed so, it makes my point moot.

Implications are extra code to define context, extra pointer back to
context from every allocation (or alterntively a weakref from
allocation point to every object it generated) and real-time
accounting as to what happens to these objects.

It should be possible to approach this problem statistically, that is
instrument only every 100s object or so.

Context could be simple, e.g. bytecode operation 3 on line 45 in
module "junk", or more complex, e.g. call stack
str<-generator at 238382<-function "foo"<-function "boo"<-...; or even
patterns like ''' str<-any depth<-function "big" '''; clearly figuring
out what hotspots are is already non-trivial, the more coplex the
definition of context the more it borders downright impossible.

p.s. can anyone share modern cpython profiling results to shed some
light on how important gc optimization really is?


More information about the Python-Dev mailing list