parenthesis
Michele Simionato
mis6 at pitt.edu
Mon Nov 4 15:24:31 EST 2002
Suppose I want to parse the following expression:
>>> exp='(a*(b+c*(2-x))+d)+f(s1)'
I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.
Now if I use a greedy regular expression
>>> import re; greedy=re.compile('\(.*\)')
I obtain to much, the full expression:
>>> match=greedy.search(exp); match.group()
'(a*(b+c*(2-x))+d)+f(s1)'
On the other hand, if I use a nongreedy regular expression
>>> nongreedy=re.compile('\(.*?\)')
I obtain too little:
>>> match=nongreedy.search(exp); match.group()
'(a*(b+c*(2-x)'
Is there a way to specify a clever regular expression able to match
the first parenthesized group ? What I did, was to write a routine
to extract the first parenthesized group:
def parenthesized_group(exp):
nesting_level,out=0,[]
for c in exp:
out.append(c)
if c=='(': nesting_level+=1
elif c==')': nesting_level-=1
if nesting_level==0: break
return ''.join(out)
>>> print parenthesized_group(exp)
(a*(b+c*(2-x))+d)
Still, this seems to me not the best way to go and I would like to know
if this can be done with a regular expression. Notice that I don't need
to control all the nesting levels of the parenthesis, for me it is enough
to recognize the end of the first parenthesized group.
Obiously, I would like a general recipe valid for more complicate
expressions: in particular I cannot assume that the first group ends
right before a mathematical operator (like '+' in this case) since
these expressions are not necessarely mathematical expressions (as the
example could wrongly suggest). In general I have expressions of the
form
( ... contains nested expressions with parenthesis... )...other stuff
where other stuff may contain nested parenthesis. I can assume that
there are no errors, i.e. that all the internal open parenthesis are
matched by closing parenthesis.
Is this a problem which can be tackled with regular expressions ?
TIA,
--
Michele Simionato - Dept. of Physics and Astronomy
210 Allen Hall Pittsburgh PA 15260 U.S.A.
Phone: 001-412-624-9041 Fax: 001-412-624-9163
Home-page: http://www.phyast.pitt.edu/~micheles/
More information about the Python-list
mailing list