creat a DOM from an html document

Paul Boddie paul at boddie.org.uk
Thu Feb 9 16:50:55 EST 2006


John J. Lee wrote:
> Mark Harrison <mh at pixar.com> writes:
>
> > Ahh, it's BeautifulSoup...
>
> Strictly that's not THE DOM, just A document object model.  The DOM
> proper is a standardised interface, which BeautifulSoup does not
> implement.  You could build a DOM using BeautifulSoup, though.

For a certain value of standardised, libxml2dom provides "the DOM" for
HTML:

import urllib, libxml2dom
f = urllib.urlopen("http://www.python.org")
s = f.read(); f.close()
d = libxml2dom.parseString(s, html=1)
print "There are", len(d.xpath("//table")), "tables in the document."

See http://www.python.org/pypi/libxml2dom for more information.

Paul




More information about the Python-list mailing list