Regular expression help

Bengt Richter bokr at oz.net
Thu Jul 17 11:57:22 EDT 2003


On Thu, 17 Jul 2003 04:27:23 GMT, David Lees <abcdebl2nonspammy at verizon.net> wrote:

>I forget how to find multiple instances of stuff between tags using 
>regular expressions.  Specifically I want to find all the text between a 
>series of begin/end pairs in a multiline file.
>
>I tried:
> >>> p = 'begin(.*)end'
> >>> m = re.search(p,s,re.DOTALL)
>
>and got everything between the first begin and last end.  I guess 
>because of a greedy match.  What I want to do is a list where each 
>element is the text between another begin/end pair.
>
You were close. For non-greedy add the question mark after the greedy expression:

 >>> import re
 >>> s = """
 ... begin first end
 ... begin
 ... second
 ... end
 ... begin problem begin nested end end
 ... begin last end
 ... """
 >>> p = 'begin(.*?)end'
 >>> rx =re.compile(p,re.DOTALL)
 >>> rx.findall(s)
 [' first ', '\nsecond\n', ' problem begin nested ', ' last ']

Notice what happened with the nested begin-ends. If you have nesting, you
will need more than a simple regex approach.

Regards,
Bengt Richter




More information about the Python-list mailing list