[XML-SIG] CDATA sections still not handled

Mike Orr iron@mso.oz.net
Thu, 18 Jan 2001 08:54:32 -0800


On Thu, Jan 18, 2001 at 10:27:46PM +1300, matt wrote:
> For example, say one wants to transport html. 
> Now html is usually really ugly in that it is hardly ever well formed xml. 
> Escaping with CDATA it is an easy way to hide that, and giving that data to an
> html renderer some time later would be fine.  Being in CDATA, it is never
> parsed for "well formedness".

I was just about to suggest looking at it this way.  If you have a set
of records and a certain tag contains HTML, which you don't want to 
un-CDATA-ize because the (human) editor doesn't want to see or type
<H1> .  

Three other questions.  Are there certain tags that will always be CDATA,
or does it differ randomly from document to document?  Do you care
whether your application changes the witespace outside that CDATA
section, making an "equivalent" document?  Or do you want the
indentation and all to remain exactly as it is?

If you know that a certain tag should always be CDATA, and you're
willing to settle for an "equivalent" document otherwise, then maybe
it doesn't matter that the parser normalizes CDATA on input, 
because you can write it out manually and convert that tag body to CDATA.

If the CDATA sections will be coming in at random and you must leave
the document formatted exactly as it is (minus whatever changes your
application is supposed to be making to it), then perhaps you need a
lower-level parser than full XML.  Perhaps then you'll want to consider
modifying one of the existing XML parser classes or the sgmllib parser
to fit your needs.

-- 
-Mike (Iron) Orr, iron@mso.oz.net  (if mail problems: mso@jimpick.com)
   http://mso.oz.net/     English * Esperanto * Russkiy * Deutsch * Espan~ol