[XML-SIG] Re: Issues with Unicode type

Daniel Veillard veillard@redhat.com
Mon, 23 Sep 2002 16:33:38 -0400


On Mon, Sep 23, 2002 at 07:12:10PM +0200, Martin v. Loewis wrote:
> Eric van der Vlist <vdv@dyomedea.com> writes:
> 
> > > By default Python is using UTF-16 as its Unicode encoding. The
> > > code-point that you specify, U+10800, is outside the BMP and hence is
> > > represented by two surrogate characters in UTF-16.
> > 
> > Arg! Does that mean that by default Python isn't strictly conform to XML
> > 1.0?
> 
> No. Why do you think this? Strictly speaking, XML 1.0 defines a
> "character" as defined by ISO/IEC 10646:1993 and ISO/IEC 10646-1:2000.
> This means only characters in the Basic Multilingual Plane are allowed
> in XML. James Clark's document is, strictly speaking, ill-formed.

  No it's not it's a well formed document. Strictly speaking you have
either well formed or not, there is not other definition, and that definition
is given in the XML specification.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/