[Tutor] another re question [using findall() with flags]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Thu, 4 Oct 2001 22:50:25 -0700 (PDT)


On Fri, 5 Oct 2001, Ignacio Vazquez-Abrams wrote:

> > How can I do a re.findall ?
> > I used this:
> > re.findall(r"\n?<script[^>]+?>.+?</script>",html)
> > it does not work. Why?
> 
> Because the dot doesn't match the newline unless you pass re.S or
> re.DOTALL as a flag, which you cannot do with re.findall(). Use the
> following instead:

It is possible to embed this DOTALL-ish flag as part of our findall()  
pattern, if we use the "(?s)" "pattern":

###
>>> re.findall(r"(?s)Hello.+?\.", """Hello world.
... Hello, my name is
... Inigo Montoya.
... """)
['Hello world.', 'Hello, my name is\nInigo Montoya.']
###

See:

    http://www.python.org/doc/lib/re-syntax.html

for more details.  However, I agree with Ignacio: use re.compile() instead
of "(?s)", just to avoid putting yet another mysterious meta-pattern-like
thing in that complicated string.



Also, do you know about using re.VERBOSE yet?  If you're really enamored
with regular expressions, you should do as much as you can to make sure
you can still read those regexes the day after tomorrow.  The re.VERBOSE
flag will let you spread a regular expression across several lines, even
allowing comments:

###
>>> hello_re = re.compile(r"""
...     Hello                       ## Let's begin with a "Hello"
...     .+?                         ## following by some nongreedy
...                                 ## characters,
...     \.                          ## and top it off with a period.
...     """, re.S | re.VERBOSE)     ## The bitwise or is intentional.
>>> hello_re.findall("""Hello world.
... Hello, my
... name is
... Inigo Montoga.""")
['Hello world.', 'Hello, my\nname is\nInigo Montoga.']
###


Good luck to you.