html parser?

Wed Oct 19 05:56:14 EDT 2005

Thorsten Kampe wrote:

>* Christoph Söllner (2005-10-18 12:20 +0100)
>  
>
>>right, that's what I was looking for. Thanks very much.
>>    
>>
>
>For simple things like that "BeautifulSoup" might be overkill.
>
>import formatter, \ 
>       htmllib,   \ 
>       urllib 
>
>url = 'http://python.org' 
>
>htmlp = htmllib.HTMLParser(formatter.NullFormatter()) 
>  
>
The problem with HTMLParser is that does not handle unclosed tags and/or 
attirbutes given with invalid syntax.
Unfortunately, many sites on the internet use malformed HTML pages. You 
are right, BeautifulSoup is an overkill
(it is rather slow) but I'm affraid this is the only fault-tolerant 
solution.

  Les