[newbie] Strange behavior of the re module

Fred fred at acme.com
Sat Aug 21 16:26:48 EDT 2004


On Sat, 21 Aug 2004 17:10:09 +0200, Fred <fred at acme.com> wrote:
>The script does run, but 

Guess I hit the Send button instead of Save ;-)

OK, for those interested, here's some working code, although it's
pretty slow (2mn30 when massaging a 200KB file on a P3 host):

--------------------
#The goal is to read an HTML file, extract whatever's between <body>
and </body>, read a template file, and insert what we extracted from
the first document:

import sys
import re

fp=open("./mydoc.html")
input = fp.read()
fp.close

#Needed if the document contains any backslash
input = input.replace('\\', '\\\\')
body = re.search('<body.*?>(.*?)</body>',input,re.IGNORECASE |
re.DOTALL)
if body:
	body = body.group(1)
else:
	body = "no body section found"

fp=open("./template.tpl")
output = fp.read()
fp.close

body = body + "</body>"
output = re.sub('</body>', body, output)
fp=open("./mynewfile.html","w")
fp.write(output)
fp.close

--------------------

Thx everyone for the hints
Fred.



More information about the Python-list mailing list