[I18n-sig] How does Python Unicode treat surrogates?
Tom Emerson
tree@basistech.com
Mon, 25 Jun 2001 20:57:58 -0400
Martin v. Loewis writes:
> > How then is u"\U00200000" represented internally if you use UCS-2 as
> > the internal storage representation?
>
> I think the obvious answer is: It is not supported. It will give an
> exception when you try to convert an UTF-8 or UTF-16 string that has
> such a character, it will be an error if you pass a surrogate to
> unichr, or in a \u literal.
So the characters added in Unicode 3.1 in planes 1, 2, and 14 would
not be representable in Python? Seems a bit draconian to make your
life easier.
-tree
--
Tom Emerson Basis Technology Corp.
Sr. Sinostringologist http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"