NEWBIE: Removing HTML/JavaScript from a webpage
Owen Marshall
malachi at NOSPAM.bardstowncable.net
Sun Jul 21 14:59:29 EDT 2002
Ok...here is my question. I have this bit of code:
import urllib
response =
urllib.urlopen('http://movies.go.com/cgi/movielistings/request.dll?ZIPSPECIFIC&zip_code=40004&date=07/20/2002')
resp = response.read()
This grabs the movie playtimes for my area. Now, my big question -- how
do I remove all of the junk Javascript and HTML? I stole this bit of
code from somewhere:
import re
split = re.sub("<[^>]*>","",resp)
But, this only removes the HTML -- the Javascript is still there; and I
have no idea on how to modify that so it eliminates the script.
Thanks!
-o
More information about the Python-list
mailing list