[Python-Dev] Encoding of code in XML

Sjoerd Mullender sjoerd@oratrix.nl
Wed, 19 Apr 2000 21:24:31 +0200


On Wed, Apr 19 2000 "David Ascher" wrote:

> > What is wrong with encoding ]]> in the XML way by using an extra
> > CDATA.  In other words split up the CDATA section into two in the
> > middle of the ]]> sequence:
> >
> > import string
> > def encode_cdata(str):
> >     return '<![CDATA[' + \
> > 	   string.join(string.split(str, ']]>'), ']]]]><![CDATA[>')) + \
> > 	   ']]>'
> 
> If I understand what you're proposing, you're splitting a single bit of
> Python code into N XML elements.  This requires smarts not on the decode
> function (where they should be, IMO), but on the XML parsing stage (several
> leaves of the tree have to be merged).  Seems like the wrong direction to
> push things.  Also, I can imagine cases where the app puts several scripts
> in consecutive CDATA elements (assuming that's legal XML), and where a merge
> which inserted extra ]]> would be very surprising.
> 
> Maybe I'm misunderstanding you, though....

I think you're not misunderstanding me, but maybe you are
misunderstanding XML.  :-)
[Of course, it is also conceivable that I misunderstand XML. :-]
First of all, I don't propose to split up the single bit of Python
into multiple XML elements.  CDATA sections are not XML elements.  The
XML standard says this:
	CDATA sections may occur anywhere character data may occur;
	they are used to escape blocks of text containing characters
	which would otherwise be recognized as markup.
	[http://www.w3.org/TR/REC-xml#sec-cdata-sect]
In other words, according to the XML standard wherever you are
allowed to put character data (such as in this case Python code), you
are allowed to use CDATA sections.  Their purpose is to escape blocks
of text containing characters that would otherwise be recognized as
markup.  CDATA sections are not part of the markup, so the XML parser
is allowed to coallese the multiple CDATA sections and other character
data into one string before it gives it to the application.

So, yes, this requires smarts on the XML parsing stage, but I think
those smarts need to be there anyway.

If an application put several pieces of Python code in one character
data section, it is basically on its own.  I don't think XML
guarantees that those pieces aren't merged into one string by the XML
parser before it gets to the application.

As I said already, this is my interpretation of XML, and I could be
misinterpreting things.

-- Sjoerd Mullender <sjoerd.mullender@oratrix.com>