re.DOTALL

Bengt Richter bokr at oz.net
Wed Nov 27 16:02:11 EST 2002


On Wed, 27 Nov 2002 13:46:56 -0500, Irina Szabo <irina at simbiosys.ca> wrote:

>
>I need to remove all tags from  HTML files.
>
>matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
>print matchstr.sub(" ", str )
                         ^^^--- BTW, using builtin names for variables is not a good idea ;-)
>
>The program works well for  tags located on one line,
>but dosn't  delete tags if  the brackets <> are on different lines, like
>
><!--
>body {  font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color: 
>#000000}
>-->
>
>What is wrong? 
>
If you had posted what you actually did (noodge ;-),
it would probably have been clear, but ...

====< interactive snip >=======
 >>> aString = """
 ... something, followed by your html comment:
 ... <!--
 ... body {  font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color:
 ... #000000}
 ... -->
 ... followed by this line.
 ... """
 >>> import re
 >>> matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
 >>> print matchstr.sub(" ", aString)

 something, followed by your html comment:

 followed by this line.
===============================

... works, so you must have done something else.

Regards,
Bengt Richter



More information about the Python-list mailing list