Fwd: [XML-SIG] xmlpickle.py ?!

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Wed, 9 Aug 2000 22:12:34 +0200


Tom wrote:
> >    output =3D string.replace(data, "]]>", "]]]><![CDATA[]>")
>=20
> Holy cow, /F!  But did you really mean
>=20
> output =3D string.replace(data, "]]>", "]]]><![CDATA[]]]>")

nope.  but I didn't make it clear that the idea was to put the
"output" string inside a CDATA section in the first place.

here's how it works:

1. the original "]]>" is split into two parts: "]" and "]>".

2. the "]" is put at the end of the first CDATA section, like this:

    "]" + "]]>"

3. the "]>" is put at the beginning of a second CDATA section,
like this:

    "<![CDATA[" + "]>"

the reason this trick works is that "]]>" is the *only* thing that's
recognized as markup in a CDATA section (see section 2.7 of the
XML spec):

    /.../

    [18]  CDSect ::=3D  CDStart CData CDEnd=20
    [19]  CDStart ::=3D  '<![CDATA['=20
    [20]  CData ::=3D  (Char* - (Char* ']]>' Char*)) =20
    [21]  CDEnd ::=3D  ']]>'=20
=20
    Within a CDATA section, only the CDEnd string is recognized
    as markup /.../

:::

also note that

    /.../ CDATA sections cannot nest /.../

doesn't mean that you cannot put a CDStart tag inside another
CDATA section (e.g. if you're embedding XML in a CDATA section).
once the parser has started parsing the CDATA section, it will
simply skip over any embedded CDATA section -- but it will stop
at the first CDEnd tag it sees, unless you escape them as shown
above.

:::

one drawback here is that you may end up with more than one
CDATA segment at the receiving end, so a naive reader may mess
things up.  but if it does, it's broken.

</F>