Parsing complex web pages safely with htmllib.HTMLParser

Jason Orendorff jason at jorendorff.com
Thu Jan 24 03:24:40 EST 2002


Bernard Yue wrote:
> I was trying to clean up the html a bit to make it pass the parser. 
> However, the page contains too much errors that I think I will have to
> spend another half an hour to do it.  So I stop.

HTML Tidy is worth a shot.   http://tidy.sourceforge.net/

## Jason Orendorff    http://www.jorendorff.com/




More information about the Python-list mailing list