parenthesis
Mike C. Fletcher
mcfletch at rogers.com
Mon Nov 4 16:43:57 EST 2002
http://simpleparse.sf.net/
HTH,
Mike
Joshua Marshall wrote:
>Regular expressions are not powerful enough to be used to match
>strings when you need to be intelligent about nesting. There are
>probably parser generators available--links anyone?
>
>For your particular application, also take a look at the "parser"
>Python module. It's a little ugly, since it gives you complete
>(rather than abstract) syntax trees, but it may help you.
>
>
>Michele Simionato <mis6 at pitt.edu> wrote:
>
>
>>Suppose I want to parse the following expression:
>>
>>
>
>
>
>>>>>exp='(a*(b+c*(2-x))+d)+f(s1)'
>>>>>
>>>>>
>
>
>
>>I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.
>>
>>
>
>
>
>>Now if I use a greedy regular expression
>>
>>
>
>
>
>>>>>import re; greedy=re.compile('\(.*\)')
>>>>>
>>>>>
>
>
>
>>I obtain to much, the full expression:
>>
>>
>
>
>
>>>>>match=greedy.search(exp); match.group()
>>>>>
>>>>>
>
>
>
>>'(a*(b+c*(2-x))+d)+f(s1)'
>>
>>
>
>
>
>>On the other hand, if I use a nongreedy regular expression
>>
>>
>
>
>
>>>>>nongreedy=re.compile('\(.*?\)')
>>>>>
>>>>>
>
>
>
>>I obtain too little:
>>
>>
>
>
>
>>>>>match=nongreedy.search(exp); match.group()
>>>>>
>>>>>
>
>
>
>>'(a*(b+c*(2-x)'
>>
>>
>
>
>
>>Is there a way to specify a clever regular expression able to match
>>the first parenthesized group ? What I did, was to write a routine
>>to extract the first parenthesized group:
>>
>>
>
>
>
>>def parenthesized_group(exp):
>> nesting_level,out=0,[]
>> for c in exp:
>> out.append(c)
>> if c=='(': nesting_level+=1
>> elif c==')': nesting_level-=1
>> if nesting_level==0: break
>> return ''.join(out)
>>
>>
>
>
>
>>>>>print parenthesized_group(exp)
>>>>>
>>>>>
>
>
>
>>(a*(b+c*(2-x))+d)
>>
>>
>
>
>
>>Still, this seems to me not the best way to go and I would like to know
>>if this can be done with a regular expression. Notice that I don't need
>>to control all the nesting levels of the parenthesis, for me it is enough
>>to recognize the end of the first parenthesized group.
>>
>>
>
>
>
>>Obiously, I would like a general recipe valid for more complicate
>>expressions: in particular I cannot assume that the first group ends
>>right before a mathematical operator (like '+' in this case) since
>>these expressions are not necessarely mathematical expressions (as the
>>example could wrongly suggest). In general I have expressions of the
>>form
>>
>>
>
>
>
>>( ... contains nested expressions with parenthesis... )...other stuff
>>
>>
>
>
>
>>where other stuff may contain nested parenthesis. I can assume that
>>there are no errors, i.e. that all the internal open parenthesis are
>>matched by closing parenthesis.
>>
>>
>
>
>
>>Is this a problem which can be tackled with regular expressions ?
>>
>>
>
>
>
>>TIA,
>>
>>
>
>
>
>>--
>>Michele Simionato - Dept. of Physics and Astronomy
>>210 Allen Hall Pittsburgh PA 15260 U.S.A.
>>Phone: 001-412-624-9041 Fax: 001-412-624-9163
>>Home-page: http://www.phyast.pitt.edu/~micheles/
>>
>>
--
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
More information about the Python-list
mailing list