[Python-Dev] Internationalization Toolkit

Tim Peters tim_one@email.msn.com
Thu, 11 Nov 1999 02:27:52 -0500


[/F]
> last time I checked, there were no characters (even in the
> ISO standard) outside the 16-bit range.  has that changed?

[MAL]
> No, but people are already thinking about it and there is
> a defined range in the >16-bit area for private encodings
> (F0000..FFFFD and 100000..10FFFD).

Over the decades I've developed a rule of thumb that has never wound up
stuck in my ass <wink>:  If I engineer code that I expect to be in use for N
years, I make damn sure that every internal limit is at least 10x larger
than the largest I can conceive of a user making reasonable use of at the
end of those N years.  The invariable result is that the N years pass, and
fewer than half of the users have bumped into the limit <0.5 wink>.

At the risk of offending everyone, I'll suggest that, qualitatively
speaking, Unicode is as Eurocentric as ASCII is Anglocentric.  We've just
replaced "256 characters?!  We'll *never* run out of those!" with 64K.  But
when Asian languages consume them 7K at a pop, 64K isn't even in my 10x
comfort range for some individual languages.  In just a few months, Unicode
3 will already have used up > 56K of the 64K slots.

As I understand it, UTF-16 "only" adds 1M new code points.  That's in my 10x
zone, for about a decade.

predicting-we'll-live-to-regret-it-either-way-ly y'rs  - tim