[XML-SIG] (Py)DOM: Character References

Lars Marius Garshol larsga@ifi.uio.no
19 Mar 1999 09:15:30 +0100


* Carsten Oberscheid
| 
| Ok, since charrefs encode only characters from the document's base
| character set (Unicode for XML, ASCII for SGML -- is that right?)

No.  XML uses Unicode, but since XML is SGML (an SGML application
profile, to be correct), it follows that this isn't true.  And in fact
SGML as a meta-language does not have a fixed document character set.
In fact, the SGML declaration allows you to define your own character
set in terms of well-known character sets.

So, SGML can use Unicode/ISO 10646, as for example HTML 4.0 does[1],
but it can also use any other character set which consists of
well-known characters. It also has standard ways of handling
characters that are not in the character sets. However, I don't think
it can handle every character encoding, but I might be wrong.

--Lars M.

[1] <URL:http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html>