minidom and pulldom

Thu Dec 11 18:15:29 EST 2003

martin at v.loewis.de (Martin v. Löwis) writes:

> pinto at map.com (David Pinto) writes:
> 
> > I'm trying to use either the minidom or pulldom to find table tags in
> > html web pages.  I've tried parsing two web pages that show up fine in
[...]
> minidom is an XML parser. Most Web pages are not XML, but some form of
> HTML.
> 
> You should have better chances with parsing HTML using htmllib.

Or, better, HTMLParser.HTMLParser -- works better with XHTML.

If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.

John