[Mailman-Developers] output/input charset (was Re: encoding the subject line..)

Tokio Kikuchi tkikuchi@is.kochi-u.ac.jp
Fri, 18 Oct 2002 15:53:40 +0900


Hi!

Sorry for confusing the discussion on the subject line matter.
While I was examing the CookHeaders.py myself, I noticed a possible
error in the charset determination.

Here is the diff:

diff -u CookHeaders.py.orig CookHeaders.py
--- CookHeaders.py.orig Fri Oct 18 15:34:47 2002
+++ CookHeaders.py      Fri Oct 18 15:36:06 2002
@@ -220,7 +220,7 @@
      # If prefix is a byte string and there are funky characters in it that
      # don't match the charset, we might as well replace them now.
      if not _isunicode(prefix):
-        prefix = unicode(prefix, charset.get_output_charset(), 'replace')
+        prefix = unicode(prefix, charset.input_charset, 'replace')
      # We purposefully leave no space b/w prefix and subject!
      h = Header(prefix, charset, header_name='Subject')
      for s, c in headerbits:

Explanation:
There is one language where output_charset and input_charset differ;
Japanese. In the web interface and internal processing, 'euc-jp' is used
while in the mail, 'iso-2022-jp'. The prefix encoding is 'euc-jp' because
it was input from web interface, and not 'iso-2022-jp', thus coding to
unicode with output_charset will result in 'funnier' character. The
charset.input_charset may be replaced with `charset` but I'm not sure.
There was a similar error in email/Header.py, also. Please check.

-- 
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/