[Tutor] another re question

Ignacio Vazquez-Abrams ignacio@openservices.net
Fri, 5 Oct 2001 00:31:30 -0400 (EDT)


On Fri, 5 Oct 2001, Hy Python wrote:

> what if it's like this:
> html="""
> <html>
> <head>
> blahhhhhhlaaaa
> </head>
> <body>
> blahlbaaalalablaaa
> <script language="JavaScript">
> blahblahblahblahblahblahblah
> blahblahblahblah
> blahblahblahblah
> </script>
> dasdfas sdafblala
> balalala
> sddsfasdf
> <script language="JavaScript">
> blahblahblahblahblahblahblah
> blahblahblahblah
> blahblahblahblah
> </script>
> </body>
> </html>
> """
>
> How can I do a re.findall ?
> I used this:
> re.findall(r"\n?<script[^>]+?>.+?</script>",html)
> it does not work. Why?

Because the dot doesn't match the newline unless you pass re.S or re.DOTALL as
a flag, which you cannot do with re.findall(). Use the following instead:

---
regex=re.compile('<script[^>]*>.*</script>', re.S)
regex.findall(html)
---

-- 
Ignacio Vazquez-Abrams  <ignacio@openservices.net>