[XML-SIG] Change in reporting of CDATA sections

Andrew Clover and-xml at doxdesk.com
Sat May 22 18:23:00 EDT 2004


Uche Ogbuji <uche.ogbuji at fourthought.com> wrote:

> I disagree, and I use CDATA sections a lot.  Try writing an article
> about XML *in* XML (e.g. XHTML).  You might also become a fan :-)

I think that's the toolchain's job. In an ideal world there'd be an XML 
editor that wasn't awful (!) but it's easy enough with a decent text 
editor to write some XML, select it and encode/decode the offending 
characters.

S'what I do, anyway. :-)

> As long as people understand that they're a simple lexical convenience,
> I'm not sure what their harm is.

You're right: at an XML-parsing level they're not too bad, but still 
only a rather minor convenience. The problem is that they add complexity 
without completely solving the problem - if you are writing an XML 
article about CDATA sections, for example, you can't use a literal ']]>'!

> I'm not sure any level of DOM has a sane treatment of CDATA sections

I'm with you here, it's the DOM that's the real problem. Aside from 
normalising text together being defeated by them, the issues with 
splitting CDATA sections for ']]>' and out-of-encoding characters in 
DOM3 are an extra annoyance and likely source of bugs for implementations.

The legacy nonsense from DTDs is a much worse issue in my book: it turns 
XML from a simple, easy-to-grok-and-knock-up-a-noddy-parser-for notation 
into a maze of twisty little bugs, all alike.

Manifesto for a cleaner XML more suited to simple tasks (ohmygod 
Microsoft want to put XML in the DNS argh etc.):

   - no doctypes
     DTD validation is underpowered, ineffective for namespaces, and
     does not deserve to be part of the basic required XML syntax.
     Validation should be done as a layer on top of XML (Schema, RNG),
     not as part of the basic required syntax.
   - no entity references
     most common use case: named character escapes: character references
     are almost as convenient and anyway you should be using an encoding
     that doesn't require you escape them. Further use case: inclusions:
     use XInclude or similar processing layer on top of XML.
     Entity references are not worth the *enormous* complexity they add
     to the DOM (if implemented completely, anyway)
   - no default attribute values
     how hard is it for an application to take null (or '') for an
     answer?
   - no CDATA sections
     at least at a DOM level
   - no attribute normalisation
     seems to be barely used, and confuses DOM a treat
   - xmlns: declarations on the root element only, unique URIs
     being able to reuse prefixes over the document for eg. inclusions
     is not worth the pain of namespace fixup and broken interaction
     between DOM1 and DOM2 methods

any I missed?

Been having a grim day tracking down obscure DOM bugs and interactions, 
hope everyone is having a fun weekend. I'll stop ranting now then.

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the XML-SIG mailing list