encoding name mappings in codecs.py with email/charset.py

Stefanos Karasavvidis sk at isc.tuc.gr
Fri Dec 12 04:05:17 EST 2014


I've hit a wall with mailman which seems to be caused by pyhon's character
encoding names.

I've narrowed the problem down to the email/charset.py file. Basically the
following happens:

given an encoding name as 'iso-8859-X' it is transformed to 'iso8859-X'
(without the first dash). This happens with python 2.7, but not with python
3.4. Now Microsoft Exchange doesn't like the form without the dash, and
bounces the emails from Mailman. And Mailman doesn't work with python 3.x.

This transformation is done in charset.py with the following line
input_charset = codecs.lookup(input_charset).name

The following code example demonstrates the issue
   from email.charset import Charset
   charset = Charset('iso-8859-7')
   print(str(charset))

In python 2.7, iso8859-7 is printed. In python 3.4 iso-8859-7 is printed.

I tried to find the location of these mappings in the codecs.py file, but
it seems that it uses some internal mapping I couldn't find. And I'm not
100% sure that this is not OS related.

So the question basically is if there is a way to change the name mappings
this codecs file does.

My environment is Ubuntu 14.04
python2.7 --version
Python 2.7.6

python3.4 --version
Python 3.4.0

-- 
======================================================================
Stefanos Karasavvidis,  Electronic & Computer Engineer, M.Sc.
<sk at isc.tuc.gr>e-mail: sk at isc.tuc.gr, Tel.: (+30) 2821037508, Fax: (+30)
2821037520
Technical University of Crete, Campus, Building A1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20141212/050316ef/attachment.html>


More information about the Python-list mailing list