[XML-SIG] Re: Issues with Unicode type

Martin v. Loewis martin@v.loewis.de
24 Sep 2002 00:40:36 +0200


Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

> This just deepens my unease at Guido's reluctance to support
> surrogates in the code that handles UTF-16 in Python.  The
> inconsistency seems ugly.

However, it is unavoidable. It also has all been decided long ago, see
PEP 261.

> But as Tom says, it looks like this matter has been beaten to death,
> and it's pretty much settled.  Now I see why Red Hat plumped on
> compiling Python with UTF-32 support (and wchar_t).  I think it's
> the only route to sanity.

On Unix, I was indeed fighting to make Py_UNICODE equal to wchar_t
where possible. Guido disliked this on the basis of uniformity, and
space savings.

> Having said all this, Martin is right about XML and the BMP.  I'd
> forgotten.

Actually, I now think that the XML spec is inconsistent. In one place,
it allows non-BMP references; in another place, it points to
specifications that restrict themselves to the BMP.

Regards,
Martin