[XML-SIG] Re: Issues with Unicode type
Martin v. Loewis
martin@v.loewis.de
23 Sep 2002 19:15:43 +0200
Eric van der Vlist <vdv@dyomedea.com> writes:
> I would say that since a XML document is defined as set of unicode
> characters, a single character "&x10800;"
... is ill-formed. Only characters below  are allowed in XML,
strictly speaking.
> is not the same thing as a sequence of two characters.
So what?
> The content of my element <doc>𐠀</doc> doesn't seem to be
> correctly represented as a string of two characters like it is when
> I parse the document! Or have I missed something?
Yes. Python, in a narrow Unicode build, represents this character as a
Unicode object which has a length of 2. It still is a single Unicode
character.
Regards,
Martin