[I18n-sig] Codecs for Big Five and GB 2312

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Fri, 27 Oct 2000 10:10:10 +0900


Tom Emerson <tree@basistech.com> writes:
| I need codecs for transcoding to and from Big Five and GB 2312: has
| anyone written these yet? If not, I'll do it, but I would rather not
| duplicate the work.

I've maintained a codecs package named JapaneseCodecs which
contains two Japanese encodings EUC-JP and Shift JIS.  The two
encodings and Big5 are all 8-bit encodings, so you may use my
codecs as a starting point for implementing a Big5 codec.  The
JapaneseCodecs package is available at:

http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/

For personal use I also wrote a preliminary codec for a subset
of ISO 2022 (or exactly speaking, a subset of the Emacs/MULE
internal encoding, which in turn an extension of ISO 2022).
Currently the codec can handle a text that contains Japanese,
Thai, and Vietnamese characters.  The codec is written without
efficiency consideration, but it works.  Since GB 2312 is an
encoding based on ISO 2022, the codec may be a starting point,
too.  The only things that need to be done for handling GB 2312
is to add a character mapping and escape sequences for
designating character sets.  If you are interested, the codec is
available at:

http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/iso_2022_7bit.py.gz

Regards,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>