HTMLParser rejects real-life tagsoup

Rene Pijlman at de.nieuwsgroep
Mon Feb 10 18:50:29 EST 2003

I've been using the HTMLParser module to process external web
pages that I don't control. HTMLParser seems to be rather strict
about the HTML syntax and quickly raises an exception when
confronted with real-life tagsoup.

I can't say I blame it. I'm fully aware of the concepts of
language, syntax and parser :-)

Any suggestions on how to handle this? Is there a more liberal
HTML-parser available somewhere, that sort of tries to make the
best of it, like most browsers do?

René Pijlman

Wat wil jij leren?

More information about the Python-list mailing list