re.DOTALL
Hans Nowak
wurmy at earthlink.net
Wed Nov 27 14:11:37 EST 2002
Irina Szabo wrote:
> I need to remove all tags from HTML files.
>
> matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
> print matchstr.sub(" ", str )
>
> The program works well for tags located on one line,
> but dosn't delete tags if the brackets <> are on different lines, like
>
> <!--
> body { font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color:
> #000000}
> -->
Hmm, it works for me:
>>> s = """
<!--
body { font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color:
#000000}
-->
"""
>>> import re
>>> matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
>>> print matchstr.sub(" ", s)
>>> print `matchstr.sub(" ", s)`
'\n \n'
>>>
--
Hans (base64.decodestring('d3VybXlAZWFydGhsaW5rLm5ldA=='))
# decode for email address ;-)
The Pythonic Quarter:: http://www.awaretek.com/nowak/
Kaa:: http://www.awaretek.com/nowak/kaa.html
More information about the Python-list
mailing list