[Python-Dev] Memory management in the AST parser & compiler

"Martin v. Löwis" martin at v.loewis.de
Mon Nov 28 22:37:05 CET 2005


Jeremy Hylton wrote:
 > The reason this thread started was the complaint that reference
 > counting in the compiler is really difficult.  Almost every line of
 > code can lead to an error exit.  The code becomes quite cluttered when
 > it uses reference counting.  Right now, the AST is created with
 > malloc/free, but that makes it hard to free the ast at the right time.
 >  It would be fairly complex to convert the ast nodes to pyobjects.
 > They're just simple discriminated unions right now.  If they were
 > allocated from an arena, the entire arena could be freed when the
 > compilation pass ends.

I haven't looked at the AST code at all so far, but my experience
with gcc is that such an approach is fundamentally flawed: you
would always have memory that ought to survive the parsing, so
you will have to copy it out of the arena. This will either lead
to dangling pointers, or garbage memory. So in gcc, they eventually
moved to a full garbage collector (after several iterations).

Reference counting has the advantage that you can always DECREF
at the end of the function. So if you put all local variables
at the beginning of the function, and all DECREFs at the end,
getting clean memory management should be doable, IMO. Plus,
contributors would be familiar with the scheme in place.

I don't know if details have already been proposed, but I would
update asdl to generate a hierarchy of classes: i.e.

class mod(object):pass

class Module(mod):
   def __init__(self, body):
     self.body = body # List of stmt

#...

class Expression(mod):
   def __init__(self, body):
     self.body = body # expr

# ...
class Raise(stmt):
   def __init__(self, dest, values, nl):
      self.dest # expr or None
      self.values # List of expr
      self.bl     # bool (True or False)

There would be convenience functions, like

   PyObject *mod_Module(PyObject* body);
   enum mod_kind mod_kind(PyObject* mod);
   // Module, Interactive, Expression, or mod_INVALID
   PyObject *mod_Expression_body(PyObject*);
   //...
   PyObject *stmt_Raise_dest(PyObject*);

(whether the accessors return new or borrowed reference
  could be debated; plain C struct accesses would also
  be possible)

Regards,
Martin


More information about the Python-Dev mailing list