[Python-Dev] Internationalization Toolkit

M.-A. Lemburg mal@lemburg.com
Wed, 10 Nov 1999 14:13:10 +0100


Jean-Claude Wippler wrote:
> 
> Greg Stein wrote:
> [MAL:]
> > > The downside of using UTF16: it is a variable length format,
> > > so iterations over it will be slower than for UCS4.
> >
> > Bzzt. May as well go with UTF-8 as the internal format, much like Perl
> > is doing (as I recall).
> 
> Ehm, pardon me for asking - what is the brief rationale for selecting
> UCS2/4, or whetever it ends up being, over UTF8?

UCS-2 is the native format on major platforms (meaning straight
fixed length encoding using 2 bytes), ie. interfacing between
Python's Unicode object and the platform APIs will be simple and
fast.

UTF-8 is short for ASCII users, but imposes a performance 
hit for the CJK (Asian character sets) world, since UTF8 uses
*variable* length encodings.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    51 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/