Encoding newlines in XML?

Robert Kern robert.kern at gmail.com
Tue Mar 21 11:24:41 EST 2006


skip at pobox.com wrote:
> *argh!*  I hate XML!   There, now that that's off my chest...

I think, rather, that you hate XML libraries.

Which is perfectly understandable.

> I am trying to save Python code as attributes of an XML tag with
> xml.dom.minidom machinery.  The code, predicatbly enough, contains newlines.
> If I do nothing to my program text, upon output I get XML which looks like
> this:
> 
>     <SomeTag text="def _f():
>     return 3
> "/>
> 
> When that is later parsed, the newlines are replaced by spaces.  That's
> clearly no good.
> 
> I verified manually that if I changed the above to
> 
>     <SomeTag text="def _f():
    return 3
"/>
> 
> when read in, the entities are replaced by newlines and the function is
> restored to its normal indented, multiline self. 

Other libraries seem to get this right.


In [89]: from lxml import etree

In [90]: e = etree.Element('SomeTag', text="def _f():\n  return 3\n")

In [93]: e.attrib
Out[93]: {'text': 'def _f():\n  return 3\n'}

In [94]: etree.dump(e)
<SomeTag text="def _f():
  return 3
"/>

In [96]: etree.dump(etree.XML('<SomeTag text="def _f():
  return 3
"/>'))
<SomeTag text="def _f():
  return 3
"/>


I'll bet good money that ElementTree also gets this right.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco




More information about the Python-list mailing list