[I18n-sig] International Components for Unicode

Tom Emerson tree@basistech.com
Sun, 24 Jun 2001 12:15:57 -0400


Martin v. Loewis writes:
> That's true, but I'd rather prefer to integrate the encodings that
> come with the operating systems first. E.g. on Unix, iconv(3) will
> also give you many encodings. Including aliases, glibc 2.2 provides
> about 1100 encodings.

Of course iconv on Linux has a different set of encodings than iconv
on solaris, which has a different set than on Irix. And of course
those encodings that are shared are often implemented differently.

> All these encodings can be made available to Python users by just
> installing an extension module; whereas with ICU, you'd have to
> install some huge library.

You've misunderstood. I'm not saying we pull in ICU. I'm saying that
we write a set of Python modules that can read and make use of the ICU
encoding datafile formats, and use those. In ICU all encoding data is
kept as external data.

Obviously integrating all of ICU into Python would be a fool's errand.

-- 
Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"