[Expat-discuss] Handling arbitrary bytes in CDATA marked sections...

David Crowley dcrowley@scitegic.com
Thu Dec 6 09:46:10 2001


At 11:34 PM 12/5/2001, Andrew.Nesbit@CSIRO.AU wrote:
>Hi ho, if somebody could help me with this problem, I'd really be 
>appreciative :-)
>
>Basically, what I wan't to do is parse a document which includes CDATA 
>marked sections. The thing is that I want the CDATA marked sections to be 
>able to contain arbitrary 8-bit bytes (i.e. binary data). I do realise 
>that this makes the document a non-XML document, but I do not want to have 
>to use any encoding system on it. I need to read these bytes raw, (i.e. 
>not cook them into UTF-8 or anything), so they can be stored in an array 
>of unsigned chars or shorts or something.
>
>Can somebody please give me some hints on how I can do this?


This should be a FAQ.  Do yourself a favor and base64 encode it.  It's 
really not hard, it's not slow, it preserves the XML, it makes the 
representation only 30% larger, and you don't have to go making ugly hacks 
in the code that nobody else is interested in.  If your using XML then USE 
XML.  Don't bastardize it.



>I am prepared to do some hacking on the source to get this effect.


Please please please don't do that.

>Thankyou!
>-Andrew Nesbit