[Mailman-Developers] encoding the subject line..

Barry A. Warsaw barry@python.org
Thu, 17 Oct 2002 12:41:10 -0400


>>>>> "TK" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:

    >> Barry, why is mailman encoding the subject line when it doesn't
    >> have to?
    >> Why? because the recipient prefers it.  The member has set his
    >> preferred language as non-english.

    TK> Some time ago, in CookHeader.py, english is determined to use
    TK> 'iso-8859-1' because many like to use 8bit character in
    TK> us-ascii declared mail.

But that's only because Header already tries us-ascii first, so it
wouldn't make sense to include us-ascii twice in the "try to find the
best charset" loop.

    TK> Again, I think email module should be revised.
    TK> I had a little experiment

    TK> in ~/pythonlib (email version is 2.4.3)

    TK> Python 2.1.3 (#1, Sep 19 2002, 17:00:05) [GCC 2.95.2 19991024
    TK> (release)] on freebsd4 Type "copyright", "credits" or
    TK> "license" for more information.
    >> from email.Header import Header a =
    >> Header('abc','iso-8859-1',70,'Subject') print a
    TK> =?iso-8859-1?q?abc?=
    >>

    TK> Or, is this version problem? I'm using python 2.1.3 (for
    TK> Zope's sake).

No, but you're not exactly reproducing what CookHeader.py does!  In
our case, the first argument will always be a unicode string, in which
case the charset argument is just a hint.  So this is closer to what
we really do:

>>> h = Header(u'abc', 'iso-8859-1', 70, 'Subject')
>>> print h
abc

See, no encoding!

Now, I just realized that I think we're not doing the string ->
unicode conversion correctly, in CookHeaders.py, prefix_subject().  So
I'll check in a fix for that, but for me, it makes no difference for
reproducing Chuq's problem.

-Barry