Processing XML that's embedded in HTML

Tue Jan 22 17:36:22 EST 2008

On 22 Jan, 21:48, Mike Driscoll <kyoso... at gmail.com> wrote:
> On Jan 22, 11:32 am, Paul Boddie <p... at boddie.org.uk> wrote:
>
> > [1]http://www.python.org/pypi/libxml2dom
>
> I must have tried this module quite a while ago since I already have
> it installed. I see you're the author of the module, so you can
> probably tell me what's what. When I do the above, I get an empty list
> either way. See my code below:
>
> import libxml2dom
> d = libxml2dom.parse(filename, html=1)
> rows = d.xpath('//XML[@id="grdRegistrationInquiryCustomers"]/BoundData/
> Row')
> # rows = d.xpath("//XML/BoundData/Row")
> print rows

It may be namespace-related, although parsing as HTML shouldn't impose
namespaces on the document, unlike parsing XHTML, say. One thing you
can try is to start with a simpler query and to expand it. Start with
the expression "//XML" and add things to make the results more
specific. Generally, namespaces can make XPath queries awkward because
you have to qualify the element names and define the namespaces for
each of the prefixes used.

Let me know how you get on!

Paul