HTMLParser rejects real-life tagsoup
Rene Pijlman
reageer.in at de.nieuwsgroep
Mon Feb 10 18:50:29 EST 2003
I've been using the HTMLParser module to process external web
pages that I don't control. HTMLParser seems to be rather strict
about the HTML syntax and quickly raises an exception when
confronted with real-life tagsoup.
I can't say I blame it. I'm fully aware of the concepts of
language, syntax and parser :-)
Any suggestions on how to handle this? Is there a more liberal
HTML-parser available somewhere, that sort of tries to make the
best of it, like most browsers do?
--
René Pijlman
Wat wil jij leren? http://www.leren.nl
More information about the Python-list
mailing list