Processing XML that's embedded in HTML

Mike Driscoll kyosohma at gmail.com
Wed Jan 23 11:15:55 EST 2008


Stefan,

> I would really encourage you to use the normal parser here instead of iterparse().
>
>   from lxml import etree
>   parser = etree.HTMLParser()
>
>   # parse the HTML/XML melange
>   tree = etree.parse(filename, parser)
>
>   # if you want, you can construct a pure XML document
>   row_root = etree.Element("newroot")
>   for row in tree.iterfind("//Row"):
>       row_root.append(row)
>
> In your specific case, I'd encourage using lxml.objectify:
>
> http://codespeak.net/lxml/dev/objectify.html
>
> It will allow you to do this (untested):
>
>   from lxml import etree, objectify
>   parser = etree.HTMLParser()
>   lookup = objectify.ObjectifyElementClassLookup()
>   parser.setElementClassLookup(lookup)
>
>   tree = etree.parse(filename, parser)
>
>   for row in tree.iterfind("//Row"):
>       print row.relationship, row.StartDate, row.Priority * 2.7
>
> Stefan

Both the normal parser example and the objectify example you gave me
give a traceback as follows:

Traceback (most recent call last):
  File "\\clippy\xml_parser2.py", line 70, in -toplevel-
    for row in tree.iterfind("//Row"):
AttributeError: 'etree._ElementTree' object has no attribute
'iterfind'


Is there some kind of newer version of lxml?

Mike



More information about the Python-list mailing list