[Mailman-i18n] Pipermail and non-English lists

Fri Nov 22 00:19:17 2002

Hi,

> Why does it need to be part of the global pipeline?  I'd imagine its
> only necessary where email messages might get displayed in http, so
> that'd mean Pipermail and the admindb.

Because there may arise many occasions to display or decorate
in the http/mail services, I want to convert the charset only once.
I also use namazu for indexing the archive and it is convenient if
charset is unique and not containing unicode.

> Most languages aren't going to need alternative charsets, right?  I'm
> guessing Japanese and perhaps other Asian languages (but I don't know
> for sure).  What would be the value of altcharsets for Japanese?

Since we set 'euc-jp' as the standard charset for japanese, remainings
are 'iso-2022-jp', 'shift_jis' and 'cp932'.

Here is my version of Entry.py which is going to be part of a japanese
patch I am going to release. I need to write Exit.py (or something)
because 'iso-2022-jp' is the de-facto standard for japanese mail.

===========================================
from Mailman import Message, mm_cfg
from email.Header import decode_header
import re

_ = i18n._

def get_header_decoded(h):
     # decode mime header AND convert it to standard charset if
     # alternate charset exists.
     h = decode_header(h)
     hs = ''
     spc = ''
     for (s, c) in h:
         if c and c in altcharsets:
             s = unicode(s, c, 'replace').encode(stdcharset)
         hs = hs + spc + s
         if c == None or c == 'us-ascii':
             spc = ' '
         else:
             spc = ''
     return hs

     global stdcharset, altcharsets
     lang = mlist.preferred_language
     stdcharset = mm_cfg.LC_DESCRIPTIONS[lang][1]
     try:
         altcharsets = mm_cfg.LC_DESCRIPTIONS[lang][2]
     except:
         altcharsets = None
#
     hs = get_header_decoded(msg.get('subject', _('no subject')))
     msgdata['subject'] = hs
     del msg['subject']
     msg['Subject'] = hs
#
     hs = get_header_decoded(msg['from'])
     msgdata['from'] = hs
     msg['X-MMOriginal-From'] = msg['from']
     del msg['from']
     msg['From'] = hs
#
     if altcharsets:
         for part in msg.walk():
             ctype = part.get_type()
                 m = re.search(r'charset=["\']?([\w_-]+)',
                               msg['content-type'], re.I)
                 if m:
                     charset = m.group(1).lower()
                 else:
                     charset = 'us-ascii'
                 # charset = part.get_charset()
                 if charset in altcharsets:
                     u = unicode(part.get_payload(decode=1), charset, 'replace')
                     part.set_payload(u.encode(stdcharset))
                     # set_charset cannot be used here for it may automatically
                     # convert to mail-standard charset.
                     del part['content-type']
                     part['Content-Type'] = 'text/plain; charset=%s' % stdcharset

=================================================

With this Entry.py, you only have to check 'no subject' once.

I am going to upload this on SF when the exit part was done. It may be
japanese specific but I will try it to be i18n as much as possible.

-- 
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/