parenthesis
Bengt Richter
bokr at oz.net
Mon Nov 4 17:05:11 EST 2002
On 4 Nov 2002 12:24:31 -0800, mis6 at pitt.edu (Michele Simionato) wrote:
>Suppose I want to parse the following expression:
>
>>>> exp='(a*(b+c*(2-x))+d)+f(s1)'
>
>I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.
>
>Now if I use a greedy regular expression
>
>>>> import re; greedy=re.compile('\(.*\)')
>
>I obtain to much, the full expression:
>
>>>> match=greedy.search(exp); match.group()
>
>'(a*(b+c*(2-x))+d)+f(s1)'
>
>On the other hand, if I use a nongreedy regular expression
>
>>>> nongreedy=re.compile('\(.*?\)')
>
>I obtain too little:
>
>>>> match=nongreedy.search(exp); match.group()
>
>'(a*(b+c*(2-x)'
>
>Is there a way to specify a clever regular expression able to match
>the first parenthesized group ? What I did, was to write a routine
>to extract the first parenthesized group:
>
>def parenthesized_group(exp):
> nesting_level,out=0,[]
> for c in exp:
> out.append(c)
> if c=='(': nesting_level+=1
> elif c==')': nesting_level-=1
> if nesting_level==0: break
> return ''.join(out)
>
>>>> print parenthesized_group(exp)
>
>(a*(b+c*(2-x))+d)
>
>Still, this seems to me not the best way to go and I would like to know
>if this can be done with a regular expression. Notice that I don't need
>to control all the nesting levels of the parenthesis, for me it is enough
>to recognize the end of the first parenthesized group.
>
>Obiously, I would like a general recipe valid for more complicate
>expressions: in particular I cannot assume that the first group ends
>right before a mathematical operator (like '+' in this case) since
>these expressions are not necessarely mathematical expressions (as the
>example could wrongly suggest). In general I have expressions of the
>form
>
>( ... contains nested expressions with parenthesis... )...other stuff
>
>where other stuff may contain nested parenthesis. I can assume that
>there are no errors, i.e. that all the internal open parenthesis are
>matched by closing parenthesis.
>
>Is this a problem which can be tackled with regular expressions ?
>
Well, they don't count, so if you want to count you have to throw in
something extra. E.g., you could do this, to insert a delimiter after
a closing right paren, and then split on the delimiter. Probably not
wonderfully efficient, and I am just duplicating what you did, except
the regex separates the chunks for me.
>>> import re
>>> rx = re.compile(r'([()]|[^()]*)')
>>> class Addelim:
... def __init__(self): self.parens=0
... def __call__(self, m):
... s = m.group(1)
... if s=='(': self.parens+=1
... if self.parens==1 and s==')':
... self.parens=0
... return s+'\x00'
... if s==')': self.parens -=1
... return s
...
>>> for e in rx.sub(Addelim(),exp).split('\x00'): print e
...
(a*(b+c*(2-x))+d)
+f(s1)
Where exp was
>>> exp
'(a*(b+c*(2-x))+d)+f(s1)'
Regards,
Bengt Richter
More information about the Python-list
mailing list