libxml2dom - parsing maligned html

Paul Boddie paul at boddie.org.uk
Tue Aug 26 11:47:32 EDT 2008


On 26 Aug, 17:28, "bruce" <bedoug... at earthlink.net> wrote:
> so it's as if the parseString only reads the initial "html" tree. i've
> reviewed as much as i can find regarding libxml2dom to try to figure out how
> i can get it to read/parse/handle both html trees/nodes.

Maybe there's some possibility to have libxml2 read directly from a
file descriptor and to stop after parsing the first document, leaving
the descriptor open; currently, this isn't supported by libxml2dom,
however. Another possibility is to feed text to libxml2 until it can
return a well-formed document, which I do as part of the
libxml2dom.xmpp module, but I don't really support this feature in the
public API.

Again, improvements to libxml2dom may happen if I find the time to do
them.

Paul



More information about the Python-list mailing list