Which HTMLParser?

Jarek Zgoda jzgoda at gazeta.usun.pl
Fri Dec 19 16:08:08 EST 2003


Tuang <tuanglen at hotmail.com> pisze:

> Which one is the best choice for parsing arbitrary real-life Web
> pages? I get the feeling that maybe the HTMLParser module is the more
> recent, more practical utility, while the htmllib version is the older
> one, retained for backward compatibility, but I'm not sure. The docs
> don't exactly say that.
> 
> Any recommendations or clarifications of what's going on would be
> helpful.

If you are not sure that your source is valid HTML, use SGML parser
instead. Personally I recommend F. Lundh's sgmlop -- fast, robust and
well-written piece of software, real Meisterstueck. Works perfectly on
Unix, Windows and IBM iSeries (formerly AS/400).

-- 
Jarek Zgoda
Unregistered Linux User # -1
http://www.zgoda.biz/ JID:zgoda at chrome.pl http://zgoda.jogger.pl/




More information about the Python-list mailing list