[I18n-sig] Re: Unicode debate

Tom Emerson tree@basistech.com
Thu, 27 Apr 2000 20:01:17 -0400 (EDT)


Bill Tutt writes:
 > > Actually a bigger concern that we hear from our customers in Japan is
 > > that Unicode has *serious* problems in asian languages.  Theey took
 > > the "unification" of Chinese and Japanese, rather than both, and
 > > therefore can not represent los of phrases quite right.  I can have
 > > someone write up a better dscription, but I was told by several
 > > Japanese people that they wouldn't use Unicode come hell or high
 > > water, basically.

Then tell them to use JIS X 0221 instead of Unicode! Since it is a
Japanese National Standard they'll be pacified into using it, even
though it is nothing more than the Japanese translation of ISO/IEC
10646-1.1993.

This is becoming a bit of an urban legend: while it is true that
during the initial Han unification period for Unicode 1.0 there was
pushback from the Japanese who thought that characters were being left
out. This issue is one of glyph variants between Japanese kanji,
Simplified and Traditional Chinese hanzi, and Korean hanja: the same
character can take different forms in each of these locales.

Remember that one of the criterion for the Unified ideographs was that
mapping between legacy encodings and Unicode can be accomplished. If a
character can be found in an existing national standard (in the case
of Japan), then chances are that code point is found in the Unicode
block.

 > Yeah, not all of the east asian ideographs are availble in Unicode atm. :(

But most, if not all, of the commonly used characters *are* available
in Unicode 3.0. It is rare, especially for Japanese, to find words
that cannot be encoded in Unicode.

 > Currently there are two pending extensions to the unified CJK ideographs.
 > Extension A is slated as part of the BMP. 0x0000 - 0xAAFF in Plane 2 is

Extension A is part of Unicode 3.0 and will be in the BMP when ISO/IEC
10646.2000 is released.

 > On top of which is there is this serious problem of end user defined
 > characters in a number of these MBCS encodings. 

Especially true when dealing with the Hong Kong Supplementary
Character Set (HKSCS). However, the HKSAR provides mapping tables for
between Big Five and HKSCS and ISO/IEC 10646.1993 and .2000 (two 10646
tables are required since some of the code points in the HKSCS are
included in IEB-A --- the rest should appear in IEB-B). The problem is
when you want to transcode between Chinese encodings: you cannot go
from HKSCS to GB2312 or GBK --- the mappings simply do not exist.

 > Don't forget the new JIS X 0213. :)

Has it been published?

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Language Hacker                                    http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"