Encoding newlines in XML?
Robert Kern
robert.kern at gmail.com
Tue Mar 21 11:24:41 EST 2006
skip at pobox.com wrote:
> *argh!* I hate XML! There, now that that's off my chest...
I think, rather, that you hate XML libraries.
Which is perfectly understandable.
> I am trying to save Python code as attributes of an XML tag with
> xml.dom.minidom machinery. The code, predicatbly enough, contains newlines.
> If I do nothing to my program text, upon output I get XML which looks like
> this:
>
> <SomeTag text="def _f():
> return 3
> "/>
>
> When that is later parsed, the newlines are replaced by spaces. That's
> clearly no good.
>
> I verified manually that if I changed the above to
>
> <SomeTag text="def _f():
return 3
"/>
>
> when read in, the entities are replaced by newlines and the function is
> restored to its normal indented, multiline self.
Other libraries seem to get this right.
In [89]: from lxml import etree
In [90]: e = etree.Element('SomeTag', text="def _f():\n return 3\n")
In [93]: e.attrib
Out[93]: {'text': 'def _f():\n return 3\n'}
In [94]: etree.dump(e)
<SomeTag text="def _f():
return 3
"/>
In [96]: etree.dump(etree.XML('<SomeTag text="def _f():
return 3
"/>'))
<SomeTag text="def _f():
return 3
"/>
I'll bet good money that ElementTree also gets this right.
--
Robert Kern
robert.kern at gmail.com
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list