[Python-checkins] r81705 - in python/trunk: Lib/email/charset.py Lib/email/test/test_email.py Misc/NEWS

r.david.murray python-checkins at python.org
Fri Jun 4 21:51:06 CEST 2010


Author: r.david.murray
Date: Fri Jun  4 21:51:06 2010
New Revision: 81705

Log:
#4487: have Charset check with codecs for possible aliases.

Previously, unexpected results occurred when email was passed, for example,
'utf8' as a charset name, since email would accept it but would *not* use
the 'utf-8' codec for it, even though Python itself recognises that as
an alias for utf-8.  Now Charset checks with codecs for aliases as well
as its own internal table.  Issue 8898 has been opened to change this
further in py3k so that all aliasing is routed through the codecs module.


Modified:
   python/trunk/Lib/email/charset.py
   python/trunk/Lib/email/test/test_email.py
   python/trunk/Misc/NEWS

Modified: python/trunk/Lib/email/charset.py
==============================================================================
--- python/trunk/Lib/email/charset.py	(original)
+++ python/trunk/Lib/email/charset.py	Fri Jun  4 21:51:06 2010
@@ -9,6 +9,7 @@
     'add_codec',
     ]
 
+import codecs
 import email.base64mime
 import email.quoprimime
 
@@ -209,7 +210,12 @@
         except UnicodeError:
             raise errors.CharsetError(input_charset)
         input_charset = input_charset.lower()
-        # Set the input charset after filtering through the aliases
+        # Set the input charset after filtering through the aliases and/or codecs
+        if not (input_charset in ALIASES or input_charset in CHARSETS):
+            try:
+                input_charset = codecs.lookup(input_charset).name
+            except LookupError:
+                pass
         self.input_charset = ALIASES.get(input_charset, input_charset)
         # We can try to guess which encoding and conversion to use by the
         # charset_map dictionary.  Try that first, but let the user override

Modified: python/trunk/Lib/email/test/test_email.py
==============================================================================
--- python/trunk/Lib/email/test/test_email.py	(original)
+++ python/trunk/Lib/email/test/test_email.py	Fri Jun  4 21:51:06 2010
@@ -2868,6 +2868,9 @@
         self.assertEqual(str(charset), 'us-ascii')
         self.assertRaises(Errors.CharsetError, Charset, 'asc\xffii')
 
+    def test_codecs_aliases_accepted(self):
+        charset = Charset('utf8')
+        self.assertEqual(str(charset), 'utf-8')
 
 
 # Test multilingual MIME headers.

Modified: python/trunk/Misc/NEWS
==============================================================================
--- python/trunk/Misc/NEWS	(original)
+++ python/trunk/Misc/NEWS	Fri Jun  4 21:51:06 2010
@@ -46,6 +46,9 @@
 Library
 -------
 
+- Issue #4487: email now accepts as charset aliases all codec aliases
+  accepted by the codecs module.
+
 - Issue #6470: Drop UNC prefix in FixTk.
 
 - Issue #5610: feedparser no longer eats extra characters at the end of


More information about the Python-checkins mailing list