[Mailman-Developers] Proper solution to Mailman CVS's Japanese problems

Ben Gertzfield che@debian.org
Wed, 26 Sep 2001 16:30:53 +0900


Unfortunately, my grand plans may be for nothing.

GNU gettext does not support storing messages in ISO-2022-JP.

At all.

>From gettext's info file:

     Because the PO files must be portable to operating systems with
     less advanced internationalization facilities, the character
     encodings that can be used are limited to those supported by both
     GNU `libc' and GNU `libiconv'. These are: `ASCII', `ISO-8859-1',
     `ISO-8859-2', `ISO-8859-3', `ISO-8859-4', `ISO-8859-5',
     `ISO-8859-6', `ISO-8859-7', `ISO-8859-8', `ISO-8859-9',
     `ISO-8859-13', `ISO-8859-15', `KOI8-R', `KOI8-U', `CP850',
     `CP866', `CP874', `CP932', `CP949', `CP950', `CP1250', `CP1251',
     `CP1252', `CP1253', `CP1254', `CP1255', `CP1256', `CP1257',
     `GB2312', `EUC-JP', `EUC-KR', `EUC-TW', `BIG5', `BIG5-HKSCS',
     `GBK', `GB18030', `SHIFT_JIS', `JOHAB', `TIS-620', `VISCII',
     `UTF-8'.

This is most frustrating.  They support Shift-JIS, Microsoft's evil
charset, but not the standard ISO-2022-JP.

I tried forcing ISO-2022-JP encoded strings into the po file, but
msgfmt barfs:

[ben@nausicaa:~/src/mailman/mailman/messages/ja/LC_MESSAGES]% msgfmt mailman.po
mailman.po: warning: Charset "iso-2022-jp" is not a portable encoding name.
                     Message conversion to user's charset might not work.
mailman.po:35: invalid control sequence
mailman.po:55: keyword "$7" unknown
mailman.po:56: end-of-line within string

Escaping the literal ESC characters as \033 works, but not the many $
characters common to ISO-2022-JP text.  I don't know how to work
around that, and besides, it's ridiculous to ask translators to have
to change all ESC characters to \033 and all $ characters to something
else.

It seems that our only solution, then, is truly to convert ISO-2022-JP
encoded messages to EUC when they're displayed on the web -- but not
when they're being archived.  

This is possible either through the kconv module, from
http://tomigaya.shibuya.tokyo.jp/~mak/kconv/ , or the Japanese codec
for Python 1.6 or later, at
http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ .

In either case, one of these will have to be bundled with the next
Mailman release, or Japanese support should be removed, as it
currently will happily send out mails encoded in 8-bit EUC-JP.  Very
very bad.  We could just translate the template mails into ISO-2022-JP
and force a Content-Type: text/plain; charset=iso-2022-jp on any
Japanese-language mails sent out, but that would leave the admin
interface being useless for Japanese mails.

Barry?  I hope you're still out there, because this is your call on
which Japanese module to include.  Both have pure Python and hybrid
Python-C versions, with compilation and speed tradeoffs for
each. kconv is GPL'd, and the Japanese codec is a BSD-like license.

Ben

-- 
Brought to you by the letters U and D and the number 16.
"A squib is a firecracker."
Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/