html parser?
Laszlo Zsolt Nagy
gandalf at designaproduct.biz
Wed Oct 19 05:56:14 EDT 2005
Thorsten Kampe wrote:
>* Christoph Söllner (2005-10-18 12:20 +0100)
>
>
>>right, that's what I was looking for. Thanks very much.
>>
>>
>
>For simple things like that "BeautifulSoup" might be overkill.
>
>import formatter, \
> htmllib, \
> urllib
>
>url = 'http://python.org'
>
>htmlp = htmllib.HTMLParser(formatter.NullFormatter())
>
>
The problem with HTMLParser is that does not handle unclosed tags and/or
attirbutes given with invalid syntax.
Unfortunately, many sites on the internet use malformed HTML pages. You
are right, BeautifulSoup is an overkill
(it is rather slow) but I'm affraid this is the only fault-tolerant
solution.
Les
More information about the Python-list
mailing list