trying to parse non valid html documents with HTMLParser
florent
florent.newsgroups at kynesthesy.org
Tue Aug 2 15:30:43 EDT 2005
I'm trying to parse html documents from the web, using the HTMLParser
class of the HTMLParser module (python 2.3), but some web documents are
not fully valids. When the parser finds an invalid tag, he raises an
exception. Then it seems impossible to resume the parsing just after
where the exception was raised. I'd like to continue parsing an html
document even if an invalid tag was found. Is it possible to do this ?
Here is a little non valid html document.
----------
<html>
<body>
<a href="""">bogus link</a>
</body>
</html>
----------
More information about the Python-list
mailing list