[Python-Dev] Encoding of code in XML

Ka-Ping Yee ping@lfw.org
Mon, 17 Apr 2000 17:34:54 -0500 (CDT)


On Mon, 17 Apr 2000, David Ascher wrote:
> 
> The only clean solution I can think of is to define a standard
> encoding/decoding process for storing program code (which may very well
> contain occurences of ]]> in CDATA, which effectively hides that triplet
> from the parser.

Hmm.  I think the way everybody does it is to use the language
to get around the need for ever saying "]]>".  For example, in
Python, if that was outside of a string, you could insert some
spaces without changing the meaning, or if it was inside a string,
you could add two strings together etc.

You're right that this seems a bit ugly, but i think it could be
even harder to get all the language communities to swallow
something like "replace all occurrences of ]]> with some ugly
escape string" -- since the above (hackish) method has the
advantage that you can just run code directly copied from a piece
of CDATA, and now you're asking them all to run the CDATA through
some unescaping mechanism beforehand.

Although i'm less optimistic about the success of such a standard,
i'd certainly be up for it, if we had a good answer to propose.

Here is one possible answer (to pick "@@" as a string very unlikely
to occur much in most scripting languages):

    @@    -->    @@@
    ]]>   -->    @@>

def escape(text):
    cdata = replace(text, "@@", "@@@")
    cdata = replace(cdata, "]]>", "@@>")
    return cdata

def unescape(cdata):
    text = replace(cdata, "@@>", "]]>")
    text = replace(text, "@@@", "@@")
    return text

The string "@@" occurs nowhere in the Python standard library.


Another possible solution:

    <]     -->    <]>
    ]]>    -->    <][

etc.  Generating more solutions is left as an exercise to the
reader. :)



-- ?!ng