re question - finiding matching ()

Christophe Delord no.spam
Sun Jan 18 13:21:18 EST 2004


On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote:

> Hello All,
> 
> To all of you regexp gurus out there...
> 
> I'd like to find all of the sub strings in the form "add(.*)"
> The catch is that I might have () in the string (e.g. "add((2 * 2),
> 100)"), 
> 
> Currently I can only get the "addr((2 *2)" using
> re.compile("\w+\([^\)]*\)"). To solve the problem a hand crafted
> search is used :-(
> 
> Is there a better way?
> 
> Thanks.
> Miki


Hello,

You may need "recursive patterns" to do this but regular expressions
cannot handle this. You can simulate recursive pattern by limiting the
recursivity level. For example the expression inside () should be
[^\(\)]+ at level 0. At level 1, you can match only zero or one pair:
(?:\([^\(\)]+\)|[^\(\)]+)* and so on.

You can build such an expression recursively:

def make_par_re(level=6):
    if level < 1:
        return r'[^\(\)]+'
    else:
        return r'(?:\(%s\)|%s)*'%(make_par_re(level-1), make_par_re(0))

par_re = re.compile(r"\w+\(%s\)"%make_par_re())

But in this case you are limited to 6 levels.

Now you can try this :

for m in par_re.findall("add((2*2), 100)  some text sub(a, b*(10-c),
f(g(a,b), h(c, d)))"):
    print m

I don't really like this solution because the expressions are ugly (try
print make_par_re(6)).

Anyway a better solution would be to use a syntactic parser. You can
write your own by hand or make your choice here:
http://www.python.org/sigs/parser-sig/



Best regards,
Christophe.



More information about the Python-list mailing list