minidom and pulldom

John J. Lee jjl at pobox.com
Sun Dec 14 19:37:12 EST 2003


pinto at map.com (David Pinto) writes:

> I'm trying to use either the minidom or pulldom to find table tags in
> html web pages.  I've tried parsing two web pages that show up fine in
> my browser, but I get errors when I call minidom.parse, or try to get
> events with pulldom.  Is there a parser that is as forgiving as web
> browsers?

Didn't this get answered just the other day?

minidom and pulldom are built on XML parsers.  HTML is not XML.

If you want a tree, I recommend using pushing the HTML through mxTidy
or uTidylib, and feeding the resultant XHTML to the XML API of your
choice.


John




More information about the Python-list mailing list