[I18n-sig] Fw: Codecs for Japanese character encodings

Andy Robinson andy@reportlab.com
Mon, 10 Apr 2000 20:49:16 +0100


(I forwarded this to the SIG on Friday, but it failed to appear - hope you
don't all get it twice).
Tamito Kajiyama has written pure Python codecs for the two main Japanese
encodings!  Many thanks!

They include the 6879 characers in the JIS0208 character set in literal
Python dictionaries; so it should be trivial to write modified ones which
support vendor-specific extensions with a few extra characters, as long as
the extras are in Unicode.

I'm now rewriting something I did last year in-house for a customer - a
script to generate HTML tables and text files which exactly match the layout
of the code charts for JIS0208 in "CJKV Information Processing".  I ran
these through both codecs and viewed the results in IE5, and as far as I can
see the results are perfect.  I will post up my scripts when they look a bit
prettier :-)

It would be nice to put this code somewhere 'out there' so people can work
on it - not just codecs, but test suites.  How do people feel about starting
a project on www.sourceforge.net under CVS?

Since lots of us want to work on fast Asian codecs, another things we need
is a 'benchmark suite' - maybe a megabyte of Japanese text (mixing
everything - ASII, Kanji, half-width katakana?).  We can then use these pure
Python codecs as a baseline.

- Andy Robinson

----- Original Message -----
From: Tamito KAJIYAMA <kajiyama@grad.sccs.chukyo-u.ac.jp>
To: <andy@reportlab.com>
Sent: 07 April 2000 18:13
Subject: Re: Codecs for Japanese character encodings


> andy@reportlab.com (Andy Robinson) writes:
> |
> | >Based on the Python Unicode support proposal, I wrote codecs for
> | >two Japanese character encodings EUC-JP and Shift_JIS.  The codecs
> | >are available at the following location:
> | >
> |
>http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/tmp/japanese-codecs.tar.gz
> |
> | Many thanks for this!  I have copied it to the Internationalisation
> | Special Interest Group, where we discuss this stuff, and taken the
> | liberty of copying your message.
>
> Good news.  Thanks for the coordination.
>
> | We need to start coordinating a separate codecs library for
> | Asian languages, and I'd like to use this as a starting point
> | if OK with you.
>
> That's absolutely okay.  I'm grad if my codecs contribute to the
> the i18n SIG.  I joined the i18n-sig@python.org just after I got
> your message.  Please carry on the further discussion about the
> Japanese codecs (if any) in the list.
>
> Best regards,
>
> --
> KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>
>