[Python-Dev] UTF-16 code point comparison

Bill Tutt billtut@microsoft.com
Fri, 28 Jul 2000 09:42:56 -0700


> From: 	Tim Peters [mailto:tim_one@email.msn.com] 

> [Tim]
> > ... Don't know how long it will take this half of the world to
> > realize it, but UCS-4 is inevitable.
>
> [Bill Tutt]
> > On new systems perhaps, but important existing systems (Win32,
> > and probably Java) are stuck with that bad decision and have to
> > use UTF-16 for backward compatability purposes.

> Somehow that doesn't strike me as a good reason for Python to mimic them
> <wink>.

So don't. If you think UTF-16 is yet another bad engineering decision, then
take the hit now of making Python's unicode support natively UCS-4 so we
don't have a backward compatability problem when the next Unicode or ISO
10646 revision comes out.
Just realize and accept the cost of doing so. (constant conversions for a
nice big chunk of your users.)

> > Surrogates aren't as far out as you might think. (The next rev of
> > the Unicode spec)

> But indeed, that's the *point*:  they exhausted their 64K space in just a
> few years.  Now the same experts say that adding 4 bits to the range will
> suffice for all time; I don't buy it; they picked 4 bits because that's
what
> the surrogate mechanism was defined earlier to support.

I don't think the experts are saying the extra 4 bits will suffice for all
time, but it should certainly suffice until we meet aliens form a different
planet. :)

> > That's certainly sooner than Win32 going away.  :)

> I hope it stays around forever -- it's a great object lesson in what
> optimizing for yesterday's hardware can buy you <wink>.

Heh. A dev manager from Excel made the exact same comment to me just
yesterday. :)

Bill