[I18n-sig] How does Python Unicode treat surrogates?

Tom Emerson tree@basistech.com
Mon, 25 Jun 2001 20:57:58 -0400


Martin v. Loewis writes:
> > How then is u"\U00200000" represented internally if you use UCS-2 as
> > the internal storage representation?
> 
> I think the obvious answer is: It is not supported. It will give an
> exception when you try to convert an UTF-8 or UTF-16 string that has
> such a character, it will be an error if you pass a surrogate to
> unichr, or in a \u literal.

So the characters added in Unicode 3.1 in planes 1, 2, and 14 would
not be representable in Python? Seems a bit draconian to make your
life easier.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"