[Python-Dev] 2.2 Unicode questions

Guido van Rossum guido@digicool.com
Thu, 19 Jul 2001 10:58:23 -0400


> > But isn't the whole point of UTF-16 to fool code that believes it's
> > manipulating UCS-2 into a false sense of security? :-)
> 
> Well, sort of. More like fooling into a true sense of insecurity. :)

Same difference. :-)

> Anyway, the Standard sez that a conforming UCS-2 application will
> not use characters in the surrogates area. Future versions of ISO10646
> and the Unicode Standard will probably require UTF-16 instead of UCS-2.

So the proper way to code *libraries* that use 16-bit data would be
not to commit on the issue: don't generate surrogates on your own
account, but also don't actively reject them, instead passing them
through transparently.  This should conform to both UCS-2 and UTF-16.

--Guido van Rossum (home page: http://www.python.org/~guido/)