[Expat-discuss] Encoding lower 32 characters
Paul Prescod
paulp@ActiveState.com
Mon, 30 Apr 2001 15:08:12 -0700
Michael Wissner wrote:
>
> ...
>
> Since I find it hard to believe that certain US-ASCII characters were
> omitted from Unicode, my next guess is that the intent of the XML spec is to
> say that those special characters are not valid in an XML file; that a valid
> XML file should encode those characters using character references such as
> "" so that they don't appear literally in the file.
"Well-Formedness Constraint: Legal Character
Characters referred to using character references must match the
production for Char."
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and
FFFF. */
> ... Is it a bug in the XML spec?
Well, it is intentional, but you could argue that it is a wrong
intention. :)
> ... If it's
> correct, how can I transmit application data that contains these characters?
> Clearly I can create my own application-level escaping mechanism, but
> doesn't this defeat the purpose of having an application-independent
> standard like XML?
It defeaturs part of the purpose but encoding "control characters" is
actually pretty rare. You could make the argument that "<", ">" and "&"
are XML's control characters so the others would be redundant. If you
want to insert a NAK or ESC , I'd suggest <NAK/> or <ESC/> and so on.
You could even standardize your encoding for these characters. :)
--
Take a recipe. Leave a recipe.
Python Cookbook! http://www.ActiveState.com/pythoncookbook