Which HTMLParser?
Jarek Zgoda
jzgoda at gazeta.usun.pl
Fri Dec 19 16:08:08 EST 2003
Tuang <tuanglen at hotmail.com> pisze:
> Which one is the best choice for parsing arbitrary real-life Web
> pages? I get the feeling that maybe the HTMLParser module is the more
> recent, more practical utility, while the htmllib version is the older
> one, retained for backward compatibility, but I'm not sure. The docs
> don't exactly say that.
>
> Any recommendations or clarifications of what's going on would be
> helpful.
If you are not sure that your source is valid HTML, use SGML parser
instead. Personally I recommend F. Lundh's sgmlop -- fast, robust and
well-written piece of software, real Meisterstueck. Works perfectly on
Unix, Windows and IBM iSeries (formerly AS/400).
--
Jarek Zgoda
Unregistered Linux User # -1
http://www.zgoda.biz/ JID:zgoda at chrome.pl http://zgoda.jogger.pl/
More information about the Python-list
mailing list