[I18n-sig] Asian Encodings

Brian Takashi Hooper brian@garage.co.jp
Wed, 22 Mar 2000 11:17:43 +0900


Hi again,

One other thing I forgot to mention, is that we'll have to start
thinking about (canonical) normalization, at least on a rudimentary
level, for Asian encodings - one specific example I can think of is in
Japanese with half-width katakana characters, there are a few
diacritical marks (dakuten) which are represented themselves as separate
characters - most encoding packages I've seen special case on these and
turn them into their corresponding canonical representations.  Without
normalization, searches and processing for these characters become a bit
of pain.

So, one other goal of creating the East Asian codecs should also be to
add some normalization support to the existing framework... other
Unicode packages / implementations mostly use normalization form C for
everything.

Those that aren't familiar with Unicode Normalization Forms, here's the
technical report, which is a good reference:

http://www.unicode.org/unicode/reports/tr15/tr15-18.html

--Brian