Processing XML that's embedded in HTML
Mike Driscoll
kyosohma at gmail.com
Wed Jan 23 11:15:55 EST 2008
Stefan,
> I would really encourage you to use the normal parser here instead of iterparse().
>
> from lxml import etree
> parser = etree.HTMLParser()
>
> # parse the HTML/XML melange
> tree = etree.parse(filename, parser)
>
> # if you want, you can construct a pure XML document
> row_root = etree.Element("newroot")
> for row in tree.iterfind("//Row"):
> row_root.append(row)
>
> In your specific case, I'd encourage using lxml.objectify:
>
> http://codespeak.net/lxml/dev/objectify.html
>
> It will allow you to do this (untested):
>
> from lxml import etree, objectify
> parser = etree.HTMLParser()
> lookup = objectify.ObjectifyElementClassLookup()
> parser.setElementClassLookup(lookup)
>
> tree = etree.parse(filename, parser)
>
> for row in tree.iterfind("//Row"):
> print row.relationship, row.StartDate, row.Priority * 2.7
>
> Stefan
Both the normal parser example and the objectify example you gave me
give a traceback as follows:
Traceback (most recent call last):
File "\\clippy\xml_parser2.py", line 70, in -toplevel-
for row in tree.iterfind("//Row"):
AttributeError: 'etree._ElementTree' object has no attribute
'iterfind'
Is there some kind of newer version of lxml?
Mike
More information about the Python-list
mailing list