Stripping scripts from HTML with regular expressions
Michel Bouwmans
mfb.chikazuku at gmail.com
Wed Apr 9 15:38:09 EDT 2008
Hey everyone,
I'm trying to strip all script-blocks from a HTML-file using regex.
I tried the following in Python:
testfile = open('testfile')
testhtml = testfile.read()
regex = re.compile('<script\b[^>]*>(.*?)</script>', re.DOTALL)
result = regex.sub('', blaat)
print result
This strips far more away then just the script-blocks. Am I missing
something from the regex-implementation from Python or am I doing something
else wrong?
greetz
MFB
More information about the Python-list
mailing list