re.DOTALL

Hans Nowak wurmy at earthlink.net
Wed Nov 27 14:11:37 EST 2002


Irina Szabo wrote:
> I need to remove all tags from  HTML files.
> 
> matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
> print matchstr.sub(" ", str )
> 
> The program works well for  tags located on one line,
> but dosn't  delete tags if  the brackets <> are on different lines, like
> 
> <!--
> body {  font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color: 
> #000000}
> -->

Hmm, it works for me:

 >>> s = """
<!--
body {  font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color:
#000000}
-->
"""
 >>> import re
 >>> matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
 >>> print matchstr.sub(" ", s)



 >>> print `matchstr.sub(" ", s)`
'\n \n'
 >>>

-- 
Hans (base64.decodestring('d3VybXlAZWFydGhsaW5rLm5ldA=='))
# decode for email address ;-)
The Pythonic Quarter:: http://www.awaretek.com/nowak/
Kaa:: http://www.awaretek.com/nowak/kaa.html




More information about the Python-list mailing list