Regular expression help
yaipa h.
yaipa at yahoo.com
Thu Jul 17 12:13:34 EDT 2003
Fredrik,
Not sure about the original poster, but I can use that. Thanks!
--Alan
"Fredrik Lundh" <fredrik at pythonware.com> wrote in message news:<mailman.1058424506.12031.python-list at python.org>...
> David Lees wrote:
>
> > I forget how to find multiple instances of stuff between tags using
> > regular expressions. Specifically I want to find all the text between a
> > series of begin/end pairs in a multiline file.
> >
> > I tried:
> > >>> p = 'begin(.*)end'
> > >>> m = re.search(p,s,re.DOTALL)
> >
> > and got everything between the first begin and last end. I guess
> > because of a greedy match. What I want to do is a list where each
> > element is the text between another begin/end pair.
>
> people will tell you to use non-greedy matches, but that's often a
> bad idea in cases like this: the RE engine has to store lots of back-
> tracking information, and your program will consume a lot more
> memory than it has to (and may run out of stack and/or memory).
>
> a better approach is to do two searches: first search for a "begin",
> and once you've found that, look for an "end"
>
> import re
>
> pos = 0
>
> START = re.compile("begin")
> END = re.compile("end")
>
> while 1:
> m = START.search(text, pos)
> if not m:
> break
> start = m.end()
> m = END.search(text, start)
> if not m:
> break
> end = m.start()
> process(text[start:end])
> pos = m.end() # move forward
>
> at this point, it's also obvious that you don't really have to use
> regular expressions:
>
> pos = 0
>
> while 1:
> start = text.find("begin", pos)
> if start < 0:
> break
> start += 5
> end = text.find("end", start)
> if end < 0:
> break
> process(text[start:end])
> pos = end # move forward
>
> </F>
>
> <!-- (the eff-bot guide to) the python standard library (redux):
> http://effbot.org/zone/librarybook-index.htm
> -->
More information about the Python-list
mailing list