[I18n-sig] Big5 Codecs

Tom Emerson tree@basistech.com
Wed, 1 Nov 2000 20:12:27 -0500 (EST)


Frank J.S. Chen writes:
 > 0xA4CA and 0xA2CE has "the same form but with different typeface"
 > in Chinese and are unified into Unicode Han character set. In fact, they
 > have an identical meaning. So no matter what code point  the dictionary
 > uses to convert from Unicode, things will not go badly wrong. But we still
 > need a strategy to filter them out for completeness.

It is more than completeness, it is an issue of semantics. If you
assume that the creator of the Big Five file used a particular code
point for a particular reason, then going to and from Unicode should
not change those semantics.

 > I use Unicode mapping table for now, not vendor implementations.

OK, understood.

Thanks.

Could you send me your Big 5 and GB 2312 codecs so I can put them on
SourceForge, or could you submit them to the python-codecs project on
SourceForge. I would like to get access to these so I can try them
out.

    -tree

 > ----------------------------------------------------------------------------
 > -------
 > Chen Chien-Hsun
 > Taipei,Taiwan,R.O.C.
 > 

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"