[Mailman-i18n] "Funny" characters in real names?

Barry A. Warsaw barry@python.org
Fri, 4 Oct 2002 17:29:54 -0400


[CC'ing Fran=E7ois as the recode guru -BAW]

I think I finally, mostly, understand this issue and have fixes in
place for it.  To test it I added three users to a list, one with a
pure ascii name, one with an iso-8859-1 name and one with Tokio
Kikuchi's name. :) I can now see the names both in the admin page, the
options page, and properly encoded in email.  Everything's checked in
so I'd love to get some feedback.

Basically, when encoding for html, I want to end up with a byte string
that is encoded in the charset of the web page, with any funny
characters htmlref'd.  When encoding for email, I want to end up with
a Unicode string that the Header class can munch on to produce an RFC
2047 encoding.

I have one nit that I'm not sure how to address though.  I use VM in a
MULE-ified XEmacs to read my mail and it has pretty good header
decoding routines.  If the charset is provided, it'll show me the real
characters in a presentation buffer.  However, it needs help for
utf-8.  Here's an example of a To header sent to my address with TK's
Unicode name as the full name.  Note that I've set this member's
language to English, so Mailman doesn't know what character set the
full name is in, but hey, it's stored internally as Unicode.

    To: =3D?utf-8?b?6I+K5Zyw5pmC5aSr?=3D <barry@wooz.org>

I can take the characters 6I+K5Zyw5pmC5aSr, stick them in a file, and
see the Japanese characters in a shell buffer when I execute the
following command:

    % recode utf-8/b64..iso-2022-jp < /tmp/tk.txt

However VM doesn't show me the Japanese characters for this name
because it doesn't have internal support for utf-8 decoding.  However
it can call out to recode (or iconv) after doing the base64 conversion
internall, if I know the target charset and the command line.  But I
/don't/ know the target charset any more than Mailman does.  Just
about anything can be utf-8 encoded right?

Maybe there's no way to hook it up with VM
(vm-mime-charset-converter-alist for the Emacsheads in the audience),
or maybe I just don't know what I'm talking about. :)  Since I don't
know the target character set, is there really anything I can do?  Or
am I not encoding the full name as well as I can?

If anybody has any clues to help me get VM to display the characters
would be greatly appreciated.

-Barry