[XML-SIG] Re: Issues with Unicode type

Martin v. Loewis martin@v.loewis.de
23 Sep 2002 19:15:43 +0200


Eric van der Vlist <vdv@dyomedea.com> writes:

> I would say that since a XML document is defined as set of unicode
> characters, a single character "&x10800;" 

... is ill-formed. Only characters below &#xFFFF; are allowed in XML,
strictly speaking.

> is not the same thing as a sequence of two characters.

So what?

> The content of my element <doc>&#67584;</doc> doesn't seem to be
> correctly represented as a string of two characters like it is when
> I parse the document! Or have I missed something?

Yes. Python, in a narrow Unicode build, represents this character as a
Unicode object which has a length of 2. It still is a single Unicode
character.

Regards,
Martin