[Python-Dev] 2.2 Unicode questions

Simon Cozens simon@netthink.co.uk
Thu, 19 Jul 2001 10:15:49 -0400


On Thu, Jul 19, 2001 at 10:09:33AM -0400, Guido van Rossum wrote:
> > > Untrue: it supports range(0x110000) (in UCS-2 mode this returns a
> > > surrogate pair).  Now, maybe that's not what it *should* do...
> > 
> > It should definitely not, unless you want to break code which assumes
> > that chr() and unichr() always return a single byte/code unit !
> 
> Reasonable people can disagree about this.

It certainly should not, if by UCS-2 you actually mean UCS-2.
UCS-2 can't access characters outside the Basic Multilingual Plane,
and so shouldn't be using surrogates.

If by UCS-2 you actually mean UTF-16, then using surrogates is the
right approach. :)

Simon