[Python-Dev] Encoding of code in XML
Ka-Ping Yee
ping@lfw.org
Mon, 17 Apr 2000 17:34:54 -0500 (CDT)
On Mon, 17 Apr 2000, David Ascher wrote:
>
> The only clean solution I can think of is to define a standard
> encoding/decoding process for storing program code (which may very well
> contain occurences of ]]> in CDATA, which effectively hides that triplet
> from the parser.
Hmm. I think the way everybody does it is to use the language
to get around the need for ever saying "]]>". For example, in
Python, if that was outside of a string, you could insert some
spaces without changing the meaning, or if it was inside a string,
you could add two strings together etc.
You're right that this seems a bit ugly, but i think it could be
even harder to get all the language communities to swallow
something like "replace all occurrences of ]]> with some ugly
escape string" -- since the above (hackish) method has the
advantage that you can just run code directly copied from a piece
of CDATA, and now you're asking them all to run the CDATA through
some unescaping mechanism beforehand.
Although i'm less optimistic about the success of such a standard,
i'd certainly be up for it, if we had a good answer to propose.
Here is one possible answer (to pick "@@" as a string very unlikely
to occur much in most scripting languages):
@@ --> @@@
]]> --> @@>
def escape(text):
cdata = replace(text, "@@", "@@@")
cdata = replace(cdata, "]]>", "@@>")
return cdata
def unescape(cdata):
text = replace(cdata, "@@>", "]]>")
text = replace(text, "@@@", "@@")
return text
The string "@@" occurs nowhere in the Python standard library.
Another possible solution:
<] --> <]>
]]> --> <][
etc. Generating more solutions is left as an exercise to the
reader. :)
-- ?!ng