[XML-SIG] XML Unicode and UTF-8
Mike Brown
mike at skew.org
Thu Aug 5 22:27:29 CEST 2004
Paul Boddie wrote:
> Do this instead:
>
> utext = segment[0].decode( segment[1] )
The resulting Unicode object may contain characters which are not allowed in
XML, and thus the text may not be serializable (at least not in a way that
would produce well-formed XML).
To embed arbitrary bytes in XML, the usual advice is to first convert the
bytes into a character sequence that is permitted in XML. Base64 is a popular
and easily implemented option, albeit inefficient. The article at
http://www.javaworld.com/javaworld/javatips/jw-javatip117-p2.html suggests
that a custom Huffman implementation is nearly 1:1. I've mapped bytes into the
Private Use Area of Unicode before, too, although that's definitely not
efficient.
More information about the XML-SIG
mailing list