regex help for a newbie

Diez B. Roggisch deetsNOSPAM at web.de
Mon Apr 5 09:03:08 EDT 2004


Marco Herrn wrote:

> I have the following string in my program:
> 
>   string= "aaa%(BBB%(CCC)BBB)aaa%(DDD)aaa"
> 
> Now I need to extract the parts that are enclosed in %().
> There are 3 levels of nesting. The first level is named
> 'aaa', the second 'BBB' and 'DDD' and the third 'CCC'.
> I do not need to extract the third level at this moment, since I extract
> the parts in a recursive function. So the thing I want to achieve here
> is to extract %(BBB%(CCC)BBB) and %(DDD).


Regexes aren't powerful enough for this - they are stateless, that means
that they have no way to count the number of open parenthes already found.
so you can't solve your problem with them. 

So what you need here is a parser that has state. You can either use one of
the existing parser frameworks (I personally use spark) or you write it for
yourself, as your problem is considerably easy:
 
def parse(input):
    res = ""
    level = 0
    for c in input:
        if c == "(":
            level += 1
        elif c == ")":
            level -= 1
        if level > 0 and c != "(":
            res += c
    return res
            
-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list