DOM with HTML

Tue Jul 1 14:22:08 EDT 2003

> Hi, I need to get a sort of DOM from an HTML page that is declared as
XHTML
> but unfortunately is *not* xhtml valid.. If I try to parse it with

I use mx.Tidy in such cases, with great success.

Cheers
Franz

"Alessio Pace" <puccio_13 at yahoo.it> schrieb im Newsbeitrag
news:3GbMa.4404$FI4.118833 at tornado.fastwebnet.it...
> Hi, I need to get a sort of DOM from an HTML page that is declared as
XHTML
> but unfortunately is *not* xhtml valid.. If I try to parse it with
> xml.dom.minidom I get error with expat (as I supposed), so I was told to
> try in this way, with a "forgiving" html parser:
>
> from xml.dom.ext.reader import HtmlLib
> reader = HtmlLib.Reader()
> dom = reader.fromUri(url)       # 'url' the web page
>
> FIRST ISSUE:
> It seemed to me, reading the source code in
> $MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/  ,
> that these are 4DOM APIs , so from what I know of python distributions,
they
> are extra packages, or not? I would like to use *only* libs that are
> available in the python2.2 suite, not any extra.
>
> SECOND ISSUE:
> If the above libs were included in python (and so I would continue using
> them), how do I print a string representation of a (sub) tree of the DOM?
I
> tried with .toxml() as in the XML tutorial but that method does not exist
> for the FtNode objects that are involved there... Any idea??
>
> Thanks so much for who can help me
>
> --
> bye
> Alessio Pace