[Mailman-Developers] I18N concerns for Japanese

Brian Takashi Hooper brian@garage.co.jp
Wed, 09 Feb 2000 17:00:01 +0900


Hi there -

I just checked out the latest version of Mailman from CVS and am taking
a look at it with an eye to making it work better with our environment
here, which is Japanese.  I think that Mailman would be very popular in
Japan too with a localized interface and a few tweaks to ensure that
mail goes out in the proper format.

I haven't been on Mailman-dev for a long time yet, so forgive me if I'm
rehashing an old topic (I checked through the archives as best I could,
so I apologize in advance for redundancy):

1. Headers:

To, Subject:

The To: and Subject: lines in Japanese messages often contain Japanese;
the proper encoding for this seems to be base-64 encoded ISO-2022 (JIS). 
Mail received from list members is of course no problem, but mail which
the system sends out, if it contains Japanese, needs to be able to do
this conversion.  Looking at an example implementation (Mew, an emacs
module which does Japanese mail), it looks like only those parts of the
header which contain JIS escape sequences are encoded, that is, ASCII
parts of the To: and Subject: lines are not encoded.  I think this makes
sense, since encoding the entirety of To: would obscure the recipient
address, which might not be good for some mailers.

I assume there are other standards out there for various languages to
insert non-ascii characters in the To: and Subject: lines, probably
which do similar tricks; are there any folks out there who have similar
requirements or experiences?

I think that my difficulty would be solved if it was possible to specify
a filter which should be applied to outgoing message headers; this could
by default be None, or a filter which just returns its string argument
unchanged.  This also should be linked to the specification of charset
for the message. 

Content-type, Mime-Version:

Typically, Japanese mail sets the Content-type and Mime headers like so:

Content-type: text/plain; charset=ISO-2022-JP
Mime-Version: 1.0

Looking at the Mailman source, it looked like charset=us-ascii is
hardcoded for a few types of messages (ToDigest, bounce); as far as I
can tell it's not set for outgoing admin messages (someone please catch
me if I'm wrong on this).

The message body of Japanese messages normally sent in ISO-2022 without
bin-hex translation, but I don't think this has any implications
vis-a-vis Mailman's current behavior.

So, I guess what I'm suggesting is adding the ability to add custom
filters to do special header processing based on locale... Opinions,
additions, flames anyone?

2. Templates

We have already done some localization of templates here, and will
continue to do so until we've finished... in this case, maybe it would
be nice to offer downloads of drop-in-replacement template folders for
different languages somewhere.  I volunteer our templates when they are
done.

Obviously there are other inline text messages which also require some
mechanism for localization; maybe gettext?

Are there any other localization issues which people have come across
for other locales?

--Brian Hooper
Tokyo, Japan