[Mailman-i18n] Pipermail and non-English lists

Barry A. Warsaw barry@python.org
Thu Nov 21 13:47:46 2002


>>>>> "TK" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:

    >> I would add this issue to the installation instructions. People
    >> who use non-Latin-1 regularly will notice quickly that they get
    >> mojibake.

    TK> BTW, I propose each language define 'alternative charsets'
    TK> which are safely convertible to the 'standard charset' for the
    TK> language. On entry to the global pipeline, incoming message is
    TK> examined and converted to the stdcharset if possible. Name the
    TK> module Entry.py and rough algorithm may be like this;

    | if mlist.preferred_language has altcharsets \
    |     and msg.charset in altcharsets:
    |         msg = unicode(msg, msg.charset).encode(stdcharset)

    TK> 'replace' may be needed for safety.

    TK> Japanese need this because there are three charsets widely
    TK> used.

Ouch.  This sounds like something we probably need to do if Japanese
archives are going to come out right, but it also sounds like a fair
bit of work.

Why does it need to be part of the global pipeline?  I'd imagine its
only necessary where email messages might get displayed in http, so
that'd mean Pipermail and the admindb.

Most languages aren't going to need alternative charsets, right?  I'm
guessing Japanese and perhaps other Asian languages (but I don't know
for sure).  What would be the value of altcharsets for Japanese?

-Barry



More information about the Mailman-i18n mailing list