[XML-SIG] unicode data

Lars Marius Garshol larsga@garshol.priv.no
06 Nov 2000 10:04:36 +0100


* Alexandre Fayolle
| 
| It means that the DOM spec specifies that the CharacterData holds
| its data in a DOMstring type, and adds :
| 
| "Applications must encode DOMString using UTF-16 (defined in
| [Unicode] and Amendment 1 of [ISO/IEC 10646])"
| 
| Using an interface not called DOMstring is allowed by the spec, as
| long as the encoding is UTF-16.

Actually, this part of the spec is basically confused and confusing.
The character encoding used only matters in primitive programming
languages where there is no suitable wide string type.  This basically
means C and C++.

That the requirement for UTF-16 fits Java, tcl and Python is mostly
pure luck, since both UTF-8 (used by Perl) and UCS-4 (used by gcc) are
credible alternatives.

In most languages, the character encoding used in wide strings are
something the DOM should keep quiet about.

--Lars M.