NEWBIE: Removing HTML/JavaScript from a webpage

Owen Marshall malachi at NOSPAM.bardstowncable.net
Sun Jul 21 14:59:29 EDT 2002


Ok...here is my question. I have this bit of code:

import urllib

response = 
urllib.urlopen('http://movies.go.com/cgi/movielistings/request.dll?ZIPSPECIFIC&zip_code=40004&date=07/20/2002')

resp = response.read()

This grabs the movie playtimes for my area. Now, my big question -- how 
do I remove all of the junk Javascript and HTML? I stole this bit of 
code from somewhere:

import re
split = re.sub("<[^>]*>","",resp)

But, this only removes the HTML -- the Javascript is still there; and I 
have no idea on how to modify that so it eliminates the script.

Thanks!

-o




More information about the Python-list mailing list