DOM and HTML

Paul Boddie paul at boddie.org.uk
Sun Apr 2 16:21:27 EDT 2006


Larry Bates wrote:
> robert.differentone at gmail.com wrote:
> >
> >           I am looking for any Python library which can help to get DOM
> > tree from HTML.  Is there any way to access HTML DOM, just like
> > accessing it using javascript.

[...]

> Since the browser can't execute anything except Javascript, you

Who said anything about the browser? Accessing a DOM "just like [...]
javascript" can mean a number of things: using an API like the one
JavaScript uses, for example, as well as actually accessing a DOM
associated with a page in a browser.

> can't get to/manipulate the DOM with anything but Javascript code.
> There have been attempts at getting a browser that can execute
> Python code, but I don't think they ever really got anywhere.

Actually, this isn't strictly true either. Disregarding, perhaps
unfairly, recent work on PyXPCOM to integrate Python more tightly with
Mozilla, there are various packages which do access browser DOMs: if
the questioner uses a KDE desktop and isn't averse to installing some
packages, there's qtxmldom [1] which can access the DOM in Konqueror in
association with the kpartplugins distribution [2]; otherwise, I
believe there's a Python package for accessing Internet Explorer's DOM.

And outside browsers, one can still use various packages already
mentioned, in addition to libxml2dom [3] which provides support via
libxml2 for reading HTML and XML, producing a DOM which resembles the
standardised DOM typically available to JavaScript. It shouldn't be
forgotten that PyXML also supports HTML parsing [4], either.

Paul

[1] http://www.boddie.org.uk/python/qtxmldom.html
[2] http://www.boddie.org.uk/python/kpartplugins.html
[3] http://www.boddie.org.uk/python/libxml2dom.html
[4] http://www.boddie.org.uk/python/HTML.html




More information about the Python-list mailing list