[Python-checkins] python/dist/src/Lib/email Charset.py,1.13,1.14

Tue Dec 30 11:52:29 EST 2003

Update of /cvsroot/python/python/dist/src/Lib/email
In directory sc8-pr-cvs1:/tmp/cvs-serv3738

Modified Files:
	Charset.py 
Log Message:
Fixes to support CJKCodecs as per SF bug #852347.  Actually, this
patch removes dependencies on the old unsupported KoreanCodecs package
and the alternative JapaneseCodecs package.  Since both of those
provide aliases for their codecs, this removal just makes the generic
codec names work.

We needed to make slight changes to __init__() as well.

This will be backported to Python 2.3 when its branch freeze is over.

Index: Charset.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/email/Charset.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** Charset.py	6 Mar 2003 05:16:29 -0000	1.13
--- Charset.py	30 Dec 2003 16:52:25 -0000	1.14
***************
*** 2,5 ****
--- 2,16 ----
  # Author: che at debian.org (Ben Gertzfield), barry at zope.com (Barry Warsaw)

+ # Python 2.3 doesn't come with any Asian codecs by default.  Two packages are
+ # currently available and supported as of this writing (30-Dec-2003):
+ #
+ # CJKCodecs
+ # http://cjkpython.i18n.org
+ # This package contains Chinese, Japanese, and Korean codecs
+ 
+ # JapaneseCodecs
+ # http://www.asahi-net.or.jp/~rd6t-kjym/python
+ # Some Japanese users prefer this codec package
+ 
  from types import UnicodeType
  from email.Encoders import encode_7or8bit
***************
*** 89,113 ****
      }

- # Map charsets to their Unicode codec strings.  Note that Python doesn't come
- # with any Asian codecs by default.  Here's where to get them:
- #
- # Japanese -- http://www.asahi-net.or.jp/~rd6t-kjym/python
- # Korean   -- http://sf.net/projects/koco
- # Chinese  -- http://sf.net/projects/python-codecs
- #
- # Note that these codecs have their own lifecycle and may be in varying states
- # of stability and useability.

  CODEC_MAP = {
!     'euc-jp':      'japanese.euc-jp',
!     'iso-2022-jp': 'japanese.iso-2022-jp',
!     'shift_jis':   'japanese.shift_jis',
!     'euc-kr':      'korean.euc-kr',
!     'ks_c_5601-1987': 'korean.cp949',
!     'iso-2022-kr': 'korean.iso-2022-kr',
!     'johab':       'korean.johab',
!     'gb2132':      'eucgb2312_cn',
      'big5':        'big5_tw',
-     'utf-8':       'utf-8',
      # Hack: We don't want *any* conversion for stuff marked us-ascii, as all
      # sorts of garbage might be sent to us in the guise of 7-bit us-ascii.
--- 100,108 ----
      }

+ # Map charsets to their Unicode codec strings.
  CODEC_MAP = {
!     'gb2312':      'eucgb2312_cn',
      'big5':        'big5_tw',
      # Hack: We don't want *any* conversion for stuff marked us-ascii, as all
      # sorts of garbage might be sent to us in the guise of 7-bit us-ascii.
***************
*** 221,224 ****
--- 216,221 ----
          henc, benc, conv = CHARSETS.get(self.input_charset,
                                          (SHORTEST, BASE64, None))
+         if not conv:
+             conv = self.input_charset
          # Set the attributes, allowing the arguments to override the default.
          self.header_encoding = henc
***************
*** 230,234 ****
                                           self.input_charset)
          self.output_codec = CODEC_MAP.get(self.output_charset,
!                                             self.input_codec)

      def __str__(self):
--- 227,231 ----
                                           self.input_charset)
          self.output_codec = CODEC_MAP.get(self.output_charset,
!                                             self.output_charset)

      def __str__(self):