lxml question

Mark Thomas mark at thomaszone.com
Fri Sep 26 12:54:04 EDT 2008


On Sep 26, 11:19 am, Uwe Schmitt <rocksportroc... at googlemail.com>
wrote:
> I have to parse some text which pretends to be XML. lxml does not want
> to parse it, because it lacks a root element.
> I think that this situation is not unusual, so: is there a way to
> force lxml to parse it ?

By "pretends to be XML" you mean XML-like but not really XML?

> My work around is wrapping the text with "<root>...</root>" before
> feeding lxmls parser.

That's actually not a bad solution, if you know that the document is
otherwise well-formed. Another thing you can do is use libxml2's
"recover" mode which accommodates non-well-formed XML.

parser = etree.XMLParser(recover=True)
tree = etree.XML(your_xml_string, parser)

You'll still need to use your wrapper root element, because recover
mode will ignore everything after the first root closes (and it won't
throw an error).

-- Mark.



More information about the Python-list mailing list