parenthesis

Michele Simionato mis6 at pitt.edu
Mon Nov 4 15:24:31 EST 2002


Suppose I want to parse the following expression:

>>> exp='(a*(b+c*(2-x))+d)+f(s1)'

I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.

Now if I use a greedy regular expression

>>> import re; greedy=re.compile('\(.*\)')

I obtain to much, the full expression:

>>> match=greedy.search(exp); match.group()

'(a*(b+c*(2-x))+d)+f(s1)'

On the other hand, if I use a nongreedy regular expression

>>> nongreedy=re.compile('\(.*?\)')

I obtain too little:

>>> match=nongreedy.search(exp); match.group()

'(a*(b+c*(2-x)'

Is there a way to specify a clever regular expression able to match
the first parenthesized group  ? What I did, was to write a routine
to extract the first parenthesized group:

def parenthesized_group(exp):
    nesting_level,out=0,[]
    for c in exp:
	out.append(c)
        if c=='(': nesting_level+=1
	elif c==')': nesting_level-=1
	if nesting_level==0: break
    return ''.join(out)

>>> print parenthesized_group(exp)

(a*(b+c*(2-x))+d)

Still, this seems to me not the best way to go and I would like to know
if this can be done with a regular expression. Notice that I don't need
to control all the nesting levels of the parenthesis, for me it is enough
to recognize the end of the first parenthesized group.

Obiously, I would like a general recipe valid for more complicate
expressions: in particular I cannot assume that the first group ends 
right before a mathematical operator (like '+' in this case) since
these expressions are not necessarely mathematical expressions (as the
example could wrongly suggest). In general I have expressions of the
form

( ... contains nested expressions with parenthesis... )...other stuff

where other stuff may contain nested parenthesis. I can assume that 
there are no errors, i.e. that all the internal open parenthesis are
matched by closing parenthesis.

Is this a problem which can be tackled with regular expressions ?

TIA,

--
Michele Simionato - Dept. of Physics and Astronomy
210 Allen Hall Pittsburgh PA 15260 U.S.A.
Phone: 001-412-624-9041 Fax: 001-412-624-9163
Home-page: http://www.phyast.pitt.edu/~micheles/



More information about the Python-list mailing list