[Python-Dev] Memory management in the AST parser & compiler

Tue Nov 15 10:31:03 CET 2005

Transferring part of the discussion of Thomas Lee's PEP 341 patch to 
python-dev. . .

Neal Norwitz wrote in the SF patch tracker:
> Thomas, I hope you will write up this experience in coding
> this patch.  IMO it clearly demonstrates a problem with the
> new AST code that needs to be addressed.  ie, Memory
> management is not possible to get right.  I've got a 700+
> line patch to ast.c to correct many more memory issues
> (hopefully that won't cause conflicts with this patch).  I
> would like to hear ideas of how the AST code can be improved
> to make it much easier to not leak memory and be safe at the
> same time.

As Neal pointed out, it's tricky to write code for the AST parser and compiler 
without accidentally letting memory leak when the parser or compiler runs into 
a problem and has to bail out on whatever it was doing. Thomas's patch got to 
v5 (based on Neal's review comments) with memory leaks still in it, my review 
got rid of some of them, and we think Neal's last review of v6 of the patch 
got rid of the last of them.

I am particularly concerned about the returns hidden inside macros in the AST 
compiler's symbol table generation and bytecode generation steps. At the 
moment, every function in compile.c which allocates code blocks (or anything 
else for that matter) and then calls one of the VISIT_* macros is a memory 
leak waiting to happen.

Something I've seen used successfully (and used myself) to deal with similar 
resource-management problems in C code is to use a switch statement, rather 
than getting goto-happy.

Specifically, the body of the entire function is written inside a switch 
statement, with 'break' then used as the equivalent of "raise Exception". For 
example:

   PyObject* switchAsTry()
   {
     switch(0) {
       default:
         /* Real function body goes here */
         return result;
     }
     /* Error cleanup code goes here */
     return NULL;
   }

It avoids the potential for labelling problems that arises when goto's are 
used for resource cleanup. It's a far cry from real exception handling, but 
it's the best solution I've seen within the limits of C.

A particular benefit comes when macros which may abort function execution are 
used inside the function - if those macros are rewritten to use break instead 
of return, then the function gets a chance to clean up after an error.

Cheers,
Nick.

P.S. Getting rid of the flow control macros entirely is another option, of 
course, but it would make compile.c and symtable.c a LOT harder to follow. 
Raymond Chen's articles notwithstanding, a preprocessor-based mini-language 
does make sense in some situations, and I think this is one of them. 
Particularly since the flow control macros are private to the relevant 
implementation files.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.blogspot.com