HTMLParser rejects real-life tagsoup

Rene Pijlman reageer.in at de.nieuwsgroep
Wed Feb 12 17:09:38 EST 2003


Gerhard Häring:
>Rene Pijlman wrote:
>> I've been using the HTMLParser module to process external web
>> pages that I don't control. HTMLParser seems to be rather strict
>> [...]
>> Any suggestions on how to handle this? [...]
>
>I'd try tidying up the HTML first:
>http://www.lemburg.com/files/python/mxTidy.html

Great idea, it works fine now. Thanks!

-- 
René Pijlman

Wat wil jij leren?  http://www.leren.nl




More information about the Python-list mailing list