[Mailman-Developers]
output/input charset (was Re: encoding the subject line..)
Tokio Kikuchi
tkikuchi@is.kochi-u.ac.jp
Fri, 18 Oct 2002 15:53:40 +0900
Hi!
Sorry for confusing the discussion on the subject line matter.
While I was examing the CookHeaders.py myself, I noticed a possible
error in the charset determination.
Here is the diff:
diff -u CookHeaders.py.orig CookHeaders.py
--- CookHeaders.py.orig Fri Oct 18 15:34:47 2002
+++ CookHeaders.py Fri Oct 18 15:36:06 2002
@@ -220,7 +220,7 @@
# If prefix is a byte string and there are funky characters in it that
# don't match the charset, we might as well replace them now.
if not _isunicode(prefix):
- prefix = unicode(prefix, charset.get_output_charset(), 'replace')
+ prefix = unicode(prefix, charset.input_charset, 'replace')
# We purposefully leave no space b/w prefix and subject!
h = Header(prefix, charset, header_name='Subject')
for s, c in headerbits:
Explanation:
There is one language where output_charset and input_charset differ;
Japanese. In the web interface and internal processing, 'euc-jp' is used
while in the mail, 'iso-2022-jp'. The prefix encoding is 'euc-jp' because
it was input from web interface, and not 'iso-2022-jp', thus coding to
unicode with output_charset will result in 'funnier' character. The
charset.input_charset may be replaced with `charset` but I'm not sure.
There was a similar error in email/Header.py, also. Please check.
--
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/