minidom and pulldom
John J. Lee
jjl at pobox.com
Thu Dec 11 18:15:29 EST 2003
martin at v.loewis.de (Martin v. Löwis) writes:
> pinto at map.com (David Pinto) writes:
>
> > I'm trying to use either the minidom or pulldom to find table tags in
> > html web pages. I've tried parsing two web pages that show up fine in
[...]
> minidom is an XML parser. Most Web pages are not XML, but some form of
> HTML.
>
> You should have better chances with parsing HTML using htmllib.
Or, better, HTMLParser.HTMLParser -- works better with XHTML.
If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.
John
More information about the Python-list
mailing list