[I18n-sig] Unicode surrogates: just say no!

Guido van Rossum guido@digicool.com
Tue, 26 Jun 2001 16:36:33 -0400


> > I expect that not all Unicode users will be ready to embrace UCS-4.  I
> > don't want to hear people say "I don't want to upgrade to Python 2.2
> > because it wastes 4 bytes per Unicode character, but all I ever do is
> > bandy around basic plane characters.  Given that there's currently
> > very limited need for characters outside the basic plane, I want to be
> > able to say that Python 2.2 is UCS-4 ready, but not that it always
> > uses it.
> 
> I'm not dead-set against this but I want to point out that binary
> distributors are probably not going to bother shipping two different
> binaries. So the silent majority of Python users who download
> precompiled binaries are going to have a "flag day" where Python changes
> its default behaviour.

Distributors know their users best -- they can decide when it's time.
E.g. I expect Asian Linux distributors to take the lead here, and
American distributors to follow last, with European distributors in
the middle.

Users with different wishes (most likely users with a desire for UCS-4
in a UCS-2 world) can always build from source.

> Given infinite resources, I'd rather see "best of both worlds"
> implementations such as a flag on the Unicode object that chooses its
> internal representation (i.e. a speed tweak for the knowledgable) or
> objects that "fall back" from ASCII to UCS-2 to UCS-4 depending on the
> input data. Or even a unicode32() data type that was interoperable with
> unicode16. (and the default could change from one to the other someday)
> 
> I accept that in a world of finite resources there may be nobody
> interested enough to put in that effort but I'd rather see the option
> excluded on that basis rather than just because the code becomes more
> complex. The code complexity would be worth it if it prevents a minor
> fork in Python and varying behavior on different Pythons.

But you don't have to maintain it.  I say that this particular varying
behavior is just as acceptable as the varying int size.

Do you want to write the PEP?

--Guido van Rossum (home page: http://www.python.org/~guido/)