[Python-Dev] Internationalization Toolkit
Tim Peters
tim_one@email.msn.com
Thu, 11 Nov 1999 02:27:52 -0500
[/F]
> last time I checked, there were no characters (even in the
> ISO standard) outside the 16-bit range. has that changed?
[MAL]
> No, but people are already thinking about it and there is
> a defined range in the >16-bit area for private encodings
> (F0000..FFFFD and 100000..10FFFD).
Over the decades I've developed a rule of thumb that has never wound up
stuck in my ass <wink>: If I engineer code that I expect to be in use for N
years, I make damn sure that every internal limit is at least 10x larger
than the largest I can conceive of a user making reasonable use of at the
end of those N years. The invariable result is that the N years pass, and
fewer than half of the users have bumped into the limit <0.5 wink>.
At the risk of offending everyone, I'll suggest that, qualitatively
speaking, Unicode is as Eurocentric as ASCII is Anglocentric. We've just
replaced "256 characters?! We'll *never* run out of those!" with 64K. But
when Asian languages consume them 7K at a pop, 64K isn't even in my 10x
comfort range for some individual languages. In just a few months, Unicode
3 will already have used up > 56K of the 64K slots.
As I understand it, UTF-16 "only" adds 1M new code points. That's in my 10x
zone, for about a decade.
predicting-we'll-live-to-regret-it-either-way-ly y'rs - tim