trying to parse non valid html documents with HTMLParser

Benji York benji at benjiyork.com
Tue Aug 2 16:29:56 EDT 2005


florent wrote:
> I'm trying to parse html documents from the web, using the HTMLParser 
> class of the HTMLParser module (python 2.3), but some web documents are 
> not fully valids. 

 From http://www.crummy.com/software/BeautifulSoup/:

     You didn't write that awful page. You're just trying to get
     some data out of it. Right now, you don't really care what
     HTML is supposed to look like.

     Neither does this parser.
--
Benji York




More information about the Python-list mailing list