HTMLParser rejects real-life tagsoup

Rene Pijlman reageer.in at de.nieuwsgroep
Mon Feb 10 18:50:29 EST 2003


I've been using the HTMLParser module to process external web
pages that I don't control. HTMLParser seems to be rather strict
about the HTML syntax and quickly raises an exception when
confronted with real-life tagsoup.

I can't say I blame it. I'm fully aware of the concepts of
language, syntax and parser :-)

Any suggestions on how to handle this? Is there a more liberal
HTML-parser available somewhere, that sort of tries to make the
best of it, like most browsers do?

-- 
René Pijlman

Wat wil jij leren?  http://www.leren.nl




More information about the Python-list mailing list