[Python-Dev] Unicode and XML

Paul Prescod paul@prescod.net
Mon, 17 Apr 2000 08:37:19 -0500


Let's presume that we agreed that XML is not a language because it
doesn't have semantics. What does that have to do with the applicability
of its Unicode-handling model?

Here is a list of a hundred specifications which we can probably agree
have "useful semantics" that are all based on XML and thus have the same
Unicode model:

http://www.xml.org/xmlorg_registry/index.shtml

XML's unicode model seems mostly appropriate to me. I can only see one
reason it might not apply: which comes first the #! line or the
#encoding line? We could say that the #! line can only be used in
encodings that are direct supersets of ASCII (e.g. UTF-8 but not
UTF-16). That shouldnt' cause any problems with Unix because as far as I
know, Unix can only read the first line if it is in an ASCII superset
anyhow!

Then the second line could describe the precise ASCII superset in use
(8859-1, 8859-2, UTF-8, raw ASCII, etc.).

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
When George Bush entered office, a Washington Post-ABC News poll found
that 62 percent of Americans "would be willing to give up a few of the
freedoms we have" for the war effort. They have gotten their wish.
	- "This is your bill of rights...on drugs", Harpers, Dec. 1999