parenthesis

Mike C. Fletcher mcfletch at rogers.com
Mon Nov 4 16:43:57 EST 2002


http://simpleparse.sf.net/

HTH,
Mike

Joshua Marshall wrote:

>Regular expressions are not powerful enough to be used to match
>strings when you need to be intelligent about nesting.  There are
>probably parser generators available--links anyone?
>
>For your particular application, also take a look at the "parser"
>Python module.  It's a little ugly, since it gives you complete
>(rather than abstract) syntax trees, but it may help you.
>
>
>Michele Simionato <mis6 at pitt.edu> wrote:
>  
>
>>Suppose I want to parse the following expression:
>>    
>>
>
>  
>
>>>>>exp='(a*(b+c*(2-x))+d)+f(s1)'
>>>>>          
>>>>>
>
>  
>
>>I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.
>>    
>>
>
>  
>
>>Now if I use a greedy regular expression
>>    
>>
>
>  
>
>>>>>import re; greedy=re.compile('\(.*\)')
>>>>>          
>>>>>
>
>  
>
>>I obtain to much, the full expression:
>>    
>>
>
>  
>
>>>>>match=greedy.search(exp); match.group()
>>>>>          
>>>>>
>
>  
>
>>'(a*(b+c*(2-x))+d)+f(s1)'
>>    
>>
>
>  
>
>>On the other hand, if I use a nongreedy regular expression
>>    
>>
>
>  
>
>>>>>nongreedy=re.compile('\(.*?\)')
>>>>>          
>>>>>
>
>  
>
>>I obtain too little:
>>    
>>
>
>  
>
>>>>>match=nongreedy.search(exp); match.group()
>>>>>          
>>>>>
>
>  
>
>>'(a*(b+c*(2-x)'
>>    
>>
>
>  
>
>>Is there a way to specify a clever regular expression able to match
>>the first parenthesized group  ? What I did, was to write a routine
>>to extract the first parenthesized group:
>>    
>>
>
>  
>
>>def parenthesized_group(exp):
>>    nesting_level,out=0,[]
>>    for c in exp:
>>	out.append(c)
>>        if c=='(': nesting_level+=1
>>	elif c==')': nesting_level-=1
>>	if nesting_level==0: break
>>    return ''.join(out)
>>    
>>
>
>  
>
>>>>>print parenthesized_group(exp)
>>>>>          
>>>>>
>
>  
>
>>(a*(b+c*(2-x))+d)
>>    
>>
>
>  
>
>>Still, this seems to me not the best way to go and I would like to know
>>if this can be done with a regular expression. Notice that I don't need
>>to control all the nesting levels of the parenthesis, for me it is enough
>>to recognize the end of the first parenthesized group.
>>    
>>
>
>  
>
>>Obiously, I would like a general recipe valid for more complicate
>>expressions: in particular I cannot assume that the first group ends 
>>right before a mathematical operator (like '+' in this case) since
>>these expressions are not necessarely mathematical expressions (as the
>>example could wrongly suggest). In general I have expressions of the
>>form
>>    
>>
>
>  
>
>>( ... contains nested expressions with parenthesis... )...other stuff
>>    
>>
>
>  
>
>>where other stuff may contain nested parenthesis. I can assume that 
>>there are no errors, i.e. that all the internal open parenthesis are
>>matched by closing parenthesis.
>>    
>>
>
>  
>
>>Is this a problem which can be tackled with regular expressions ?
>>    
>>
>
>  
>
>>TIA,
>>    
>>
>
>  
>
>>--
>>Michele Simionato - Dept. of Physics and Astronomy
>>210 Allen Hall Pittsburgh PA 15260 U.S.A.
>>Phone: 001-412-624-9041 Fax: 001-412-624-9163
>>Home-page: http://www.phyast.pitt.edu/~micheles/
>>    
>>

-- 
_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/







More information about the Python-list mailing list