[Python-Dev] 2.2 Unicode questions
Simon Cozens
simon@netthink.co.uk
Thu, 19 Jul 2001 10:15:49 -0400
On Thu, Jul 19, 2001 at 10:09:33AM -0400, Guido van Rossum wrote:
> > > Untrue: it supports range(0x110000) (in UCS-2 mode this returns a
> > > surrogate pair). Now, maybe that's not what it *should* do...
> >
> > It should definitely not, unless you want to break code which assumes
> > that chr() and unichr() always return a single byte/code unit !
>
> Reasonable people can disagree about this.
It certainly should not, if by UCS-2 you actually mean UCS-2.
UCS-2 can't access characters outside the Basic Multilingual Plane,
and so shouldn't be using surrogates.
If by UCS-2 you actually mean UTF-16, then using surrogates is the
right approach. :)
Simon