Regular expression help
Bengt Richter
bokr at oz.net
Thu Jul 17 11:57:22 EDT 2003
On Thu, 17 Jul 2003 04:27:23 GMT, David Lees <abcdebl2nonspammy at verizon.net> wrote:
>I forget how to find multiple instances of stuff between tags using
>regular expressions. Specifically I want to find all the text between a
>series of begin/end pairs in a multiline file.
>
>I tried:
> >>> p = 'begin(.*)end'
> >>> m = re.search(p,s,re.DOTALL)
>
>and got everything between the first begin and last end. I guess
>because of a greedy match. What I want to do is a list where each
>element is the text between another begin/end pair.
>
You were close. For non-greedy add the question mark after the greedy expression:
>>> import re
>>> s = """
... begin first end
... begin
... second
... end
... begin problem begin nested end end
... begin last end
... """
>>> p = 'begin(.*?)end'
>>> rx =re.compile(p,re.DOTALL)
>>> rx.findall(s)
[' first ', '\nsecond\n', ' problem begin nested ', ' last ']
Notice what happened with the nested begin-ends. If you have nesting, you
will need more than a simple regex approach.
Regards,
Bengt Richter
More information about the Python-list
mailing list