[Python-Dev] Memory management in the AST parser & compiler

Thomas Lee krumms at gmail.com
Wed Nov 16 15:15:20 CET 2005


Just messing around with some ideas. I was trying to avoid the ugly 
macros (note my earlier whinge about a learning curve) but they're the 
cleanest way I could think of to get around the problem without 
resorting to a mass deallocation right at the end of the AST run. Which 
may not be all that bad given we're going to keep everything in-memory 
anyway until an error occurs ... anyway, anyway, I'm getting sidetracked :)

The idea is to ensure that all allocations within a single function are 
made using the pool so that a function finishes what it starts. This 
way, if the function fails it alone is responsible for cleaning up its 
own pool and that's all. No funkyness needed for sequences, because each 
member of the sequence belongs to the pool too. Note that the stmt_ty 
instances are also allocated using the pool.

This breaks interfaces all over the place though. Not exactly a pretty 
change :) But yeah, maybe somebody smarter than I will come up with 
something a bit cleaner.

--

/* snip! */

#define AST_SUCCESS(pool, result) return result
#define AST_FAILURE(pool, result) asdl_pool_free(pool); return result

static stmt_ty
ast_for_try_stmt(struct compiling *c, const node *n)
{
    /* with the pool stuff, we wouldn't need to declare _all_ the variables
       here either. I'm just lazy. */

    asdl_pool *pool;
    int i;
    const int nch = NCH(n);
    int n_except = (nch - 3)/3;
    stmt_ty result_st = NULL, except_st = NULL;
    asdl_seq *body = NULL, *orelse = NULL, *finally = NULL;
    asdl_seq *inner = NULL, *handlers = NULL;

    REQ(n, try_stmt);

    /* c->pool is the parent of pool. when pool is freed
       (via AST_FAILURE), it is also removed from c->pool's list of 
children */
    pool = asdl_pool_new(c->pool);
    if (pool == NULL)
        AST_FAILURE(pool, NULL);

    body = ast_for_suite(c, CHILD(n, 2));
    if (body == NULL)
        AST_FAILURE(pool, NULL);

    if (TYPE(CHILD(n, nch - 3)) == NAME) {
        if (strcmp(STR(CHILD(n, nch - 3)), "finally") == 0) {
            if (nch >= 9 && TYPE(CHILD(n, nch - 6)) == NAME) {
                /* we can assume it's an "else",
                   because nch >= 9 for try-else-finally and
                   it would otherwise have a type of except_clause */
                orelse = ast_for_suite(c, CHILD(n, nch - 4));
                if (orelse == NULL)
                    AST_FAILURE(pool, NULL);
                n_except--;
            }

            finally = ast_for_suite(c, CHILD(n, nch - 1));
            if (finally == NULL)
                AST_FAILURE(pool, NULL);
            n_except--;
        }
        else {
            /* we can assume it's an "else",
               otherwise it would have a type of except_clause */
            orelse = ast_for_suite(c, CHILD(n, nch - 1));
            if (orelse == NULL)
                AST_FAILURE(pool, NULL);
            n_except--;
        }
    }
    else if (TYPE(CHILD(n, nch - 3)) != except_clause) {
        ast_error(n, "malformed 'try' statement");
        AST_FAILURE(pool, NULL);
    }

 if (n_except > 0) {
        /* process except statements to create a try ... except */
        handlers = asdl_seq_new(pool, n_except);
        if (handlers == NULL)
            AST_FAILURE(pool, NULL);

        for (i = 0; i < n_except; i++) {
            excepthandler_ty e = ast_for_except_clause(c, CHILD(n, 3 + i 
* 3),
                                                    CHILD(n, 5 + i * 3));
            if (!e)
                AST_FAILURE(pool, NULL);
            asdl_seq_SET(handlers, i, e);
        }

        except_st = TryExcept(pool, body, handlers, orelse, LINENO(n));
        if (except_st == NULL)
            AST_FAILURE(pool, NULL);

        /* if a 'finally' is present too, we nest the TryExcept within a
           TryFinally to emulate try ... except ... finally */
        if (finally != NULL) {
            inner = asdl_seq_new(pool, 1);
            if (inner == NULL)
                AST_FAILURE(pool, NULL);
            asdl_seq_SET(inner, 0, except_st);
            result_st = TryFinally(pool, inner, finally, LINENO(n));
            if (result_st == NULL)
                AST_FAILURE(pool, NULL);
        }
        else
            result_st = except_st;
    }
    else {
        /* no exceptions: must be a try ... finally */
        assert(orelse == NULL);
        assert(finally != NULL);
        result_st = TryFinally(pool, body, finally, LINENO(n));
        if (result_st == NULL)
            AST_FAILURE(pool, NULL);
    }

    /* pool deallocated when c->pool is deallocated */
    return AST_SUCCESS(pool, result_st);
}


Nick Coghlan wrote:

>Thomas Lee wrote:
>  
>
>>As the writer of the crappy code that sparked this conversation, I feel 
>>I should say something :)
>>    
>>
>
>Don't feel bad about it. It turned out the 'helpful' review comments from Neal 
>and I didn't originally work out very well either ;)
>
>With the AST compiler being so new, this is the first serious attempt to 
>introduce modifications based on it. It's already better than the old CST 
>compiler, but that memory management in the parser is a cow :)
>
>  
>
>>>Hopefully this is all made some sense.  =)  Is this the basic strategy
>>>that an arena setup would need?  if not can someone enlighten me?
>>>      
>>>
>
>I think we need to be explicit about the problems we're trying to solve before 
>deciding on what kind of solution we want :)
>
>1. Cleaning up after failures in symtable.c and compile.c
>   It turns out this is already dealt with in the case of code blocks - the 
>compiler state handles a linked list of blocks which it automatically frees 
>when the compiler state is cleaned up.
>   So the only rule that needs to be followed in these files is to *never* 
>call any of the VISIT_* macros while there is a Python object which requires 
>DECREF'ing, or a C pointer which needs to be freed.
>   This rule was being broken in a couple of places in compile.c (with respect 
>to strings). I was the offender in both cases I found - the errors date from 
>when this was still on the ast-branch in CVS.
>   I've fixed those errors in SVN, and added a note to the comment at the top 
>of compile.c, to help others avoid making the same mistake I did.
>   It's fragile in some ways, but it does work. It makes the actual 
>compilation code look clean (because there isn't any cleanup code), but it 
>also makes that code look *wrong* (because the lack of cleanup code makes the 
>calls to "compiler_new_block" look unbalanced), which is a little disconcerting.
>
>2. Parsing a token stream into the AST in ast.c
>   This is the bit that has caused Thomas grief (the PEP 341 patch only needs 
>to modify the front end parser). When building an AST node, each of the 
>contained AST nodes or sequences has to be built first. That means that, if 
>there's a problem with any of the later subnodes, the earlier subnodes need to 
>be freed.
>   The key problem with memory management in this module is that the free 
>method to be invoked is dependent on the nature of the AST node to be freed. 
>In the case of a node sequence, it is dependent on the nature of the contained 
>elements.
>   So not only do you have to remember to free the memory, you have to 
>remember to free it the *right way*.
>
>Would it be worth the extra memory needed to store a pointer to an AST node's 
>"free" method in the AST type structure itself? And do the same for ASDL 
>sequences?
>
>Then a simple FREE_AST macro would be able to "do the right thing" when it 
>came to freeing either AST nodes or sequences. In particular, ASDL sequences 
>would be able to free their contents without knowing what those contents 
>actually are.
>
>That wouldn't eliminate the problem with memory leaks or double-deletion, but 
>it would eliminate some of the mental overhead of dealing with figuring out 
>which freeing function to invoke.
>
>Cheers,
>Nick.
>
>  
>



More information about the Python-Dev mailing list