trying to parse non valid html documents with HTMLParser
florent
florent.newsgroups at kynesthesy.org
Wed Aug 3 05:44:17 EDT 2005
> AFAIK not with HTMLParser or htmllib. You might try (if you haven't done
> yet) htmllib and see, which parser is more forgiving.
Thanks, I'll try htmllib.
In other case, I found a solution. Feeding data to the HTMLParser by
chunks extracted from the string using string.split("<"), will allow me
to loose only one tag at a time when an exception is raised !
More information about the Python-list
mailing list