Which HTMLParser?

Rene Pijlman reply.in.the.newsgroup at my.address.is.invalid
Fri Dec 19 19:36:11 EST 2003


Tuang:
>The library docs show that there is an HTMLParser module and an
>htmllib module, both of which apparently contain classes named
>"HTMLParser". There is a bit of decription of differences, but it
>still doesn't seem clear to me what the intent is.

I think the intent is to use HTMLParser. Its newer, and its documentation
doesn't scare you off with phrases like "HTML 2.0" and "SGML" :-)

>Which one is the best choice for parsing arbitrary real-life Web pages?

Neither! Real-life web pages are typically not HTML-parseable. Try tyding
it up a bit first. See http://groups.google.nl/groups?th=58cd394d2e71137f

-- 
René Pijlman




More information about the Python-list mailing list