[XML-SIG] CDATA sections still not handled

Martin v. Loewis martin@mira.cs.tu-berlin.de
Wed, 17 Jan 2001 22:57:18 +0100


> This translation obviously happens after validation, since invalid xml like
> data in CDATA will never be validated against.  Which is what I want.

I'm telling you: the data in CDATA are is just character text, not
markup. So no matter what text you put in there, it is always
well-formed and valid (unless it violates the document charset).

> > It seems you are trying to use XML in a way not supported by any
> > standard. If you have a CDATA section, it contains characters by
> > definition; you can't suppose that these characters are markup.
> 
> I don't suppose they are, I know they are.

Maybe in your understanding of how your application should work. Not
in XML.

> 2.7 CDATA Sections
> 
> [Definition: CDATA sections may occur anywhere character data may occur;
> they are used to escape blocks of text containing characters which would
> otherwise be recognized as markup. CDATA sections begin with the
> string "<![CDATA[" and end with the string "]]>":]
> 

> ummm, so can you be clearer about my apparent violation of CDATA by
> putting xml like data in it?

It is completely well-formed to put "xml-like" data into a CDATA
section. However, an application that suddenly "turns" those data into
markup by removing the CDATA markers violates XML; it appears that
your application is supposed to operate in such a way.

IOW, the data might look like xml. When they are in a CDATA section,
they are not markup. Trying to see them as markup at some point and
not as markup at some other point means to read something into the XML
standard that is not there.

> > You need to invented a new markup language for that kind of
> > processing; XML does not support such a kind of interpretation of a
> > document.
> 
> 
> No I don't, because it works fine when the CDATA label are kept, but you are
> also saying that a parser can/should translate the character references
> such as "&lt;", and looking at expat, it does, so, well, it seems to work
> perfectly fine.  

To be precise, I'm saying it can. It might chose to keep the generate
rougly the same, or even more, CDATA sections on output as well.

>But now I am interested why this is a violation.  A perfectly
>acceptable use is that one uses xml to wrap a message, which itself
>may be xml, but ut is up to the message interpreter later on to
>figure out if it valid.

It's not a violation to put "xml like" data into a CDATA section, but
they are just plain character data. I said

# So if you treat CDATA sections in any other way, you violate the XML
# recommendation.

*That* is something you cannot expect to work.

Regards,
Martin