[I18n-sig] Big5 Codecs
Tom Emerson
tree@basistech.com
Wed, 1 Nov 2000 20:12:27 -0500 (EST)
Frank J.S. Chen writes:
> 0xA4CA and 0xA2CE has "the same form but with different typeface"
> in Chinese and are unified into Unicode Han character set. In fact, they
> have an identical meaning. So no matter what code point the dictionary
> uses to convert from Unicode, things will not go badly wrong. But we still
> need a strategy to filter them out for completeness.
It is more than completeness, it is an issue of semantics. If you
assume that the creator of the Big Five file used a particular code
point for a particular reason, then going to and from Unicode should
not change those semantics.
> I use Unicode mapping table for now, not vendor implementations.
OK, understood.
Thanks.
Could you send me your Big 5 and GB 2312 codecs so I can put them on
SourceForge, or could you submit them to the python-codecs project on
SourceForge. I would like to get access to these so I can try them
out.
-tree
> ----------------------------------------------------------------------------
> -------
> Chen Chien-Hsun
> Taipei,Taiwan,R.O.C.
>
--
Tom Emerson Basis Technology Corp.
Zenkaku Language Hacker http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"