encoding name mappings in codecs.py with email/charset.py

gst g.starck at gmail.com
Sun Dec 14 14:53:41 EST 2014


Le dimanche 14 décembre 2014 14:10:22 UTC-5, Stefanos Karasavvidis a écrit :
> thanks for replying gst.
> 
> I've thought already of patching the Charset class, but hoped for a cleaner solution. 
> 
> 
> This ALIASES dict has already all the iso names *with* a dash. So it must get striped somewhere else.


not on my side, modifying this dict with the missing key-value apparently does what you want also :

Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> 
>>> import email.charset
>>> email.charset.ALIASES
{'latin-8': 'iso-8859-14', 'latin-9': 'iso-8859-15', 'latin-2': 'iso-8859-2', 'latin-3': 'iso-8859-3', 'latin-1': 'iso-8859-1', 'latin-6': 'iso-8859-10', 'latin-7': 'iso-8859-13', 'latin-4': 'iso-8859-4', 'latin-5': 'iso-8859-9', 'euc_jp': 'euc-jp', 'latin-10': 'iso-8859-16', 'ascii': 'us-ascii', 'latin_10': 'iso-8859-16', 'latin_1': 'iso-8859-1', 'latin_2': 'iso-8859-2', 'latin_3': 'iso-8859-3', 'latin_4': 'iso-8859-4', 'latin_5': 'iso-8859-9', 'latin_6': 'iso-8859-10', 'latin_7': 'iso-8859-13', 'latin_8': 'iso-8859-14', 'latin_9': 'iso-8859-15', 'cp949': 'ks_c_5601-1987', 'euc_kr': 'euc-kr'}
>>> 
>>> for i in range(1, 16):
	c = 'iso-8859-' + str(i)
	email.charset.ALIASES[c] = c

	
>>> 
>>> iso7 = email.charset.Charset('iso-8859-7')
>>> iso7
iso-8859-7
>>> str(iso7)
'iso-8859-7'
>>> 

regards,

gst.

> 
> sk
> 
> 
> 
> On Sun, Dec 14, 2014 at 7:21 PM, gst <g.st... at gmail.com> wrote:
> Le vendredi 12 décembre 2014 04:21:14 UTC-5, Stefanos Karasavvidis a écrit :
> 
> > I've hit a wall with mailman which seems to be caused by pyhon's character encoding names.
> 
> >
> 
> > I've narrowed the problem down to the email/charset.py file. Basically the following happens:
> 
> >
> 
> 
> 
> Hi,
> 
> 
> 
> it's all in the email.charset.ALIASES dict.
> 
> 
> 
> you could also simply patch the __str__ method of Charset :
> 
> 
> 
> Python 2.7.6 (default, Mar 22 2014, 22:59:56)
> 
> [GCC 4.8.2] on linux2
> 
> Type "copyright", "credits" or "license()" for more information.
> 
> >>>
> 
> >>> import email.charset
> 
> >>>
> 
> >>> c = email.charset.Charset('iso-8859-7')
> 
> >>> str(c)
> 
> 'iso8859-7'
> 
> >>>
> 
> >>> old = email.charset.Charset.__str__
> 
> >>>
> 
> >>> def patched(self):
> 
>         r = old(self)
> 
>         if r.startswith('iso'):
> 
>                 return 'iso-' + r[3:]
> 
>         return r
> 
> 
> 
> >>>
> 
> >>> email.charset.Charset.__str__ = patched
> 
> >>>
> 
> >>> str(c)
> 
> 'iso-8859-7'
> 
> >>>
> 
> 
> 
> 
> 
> regards,
> 
> 
> 
> gst.
> 
> --
> 
> https://mail.python.org/mailman/listinfo/python-list
> 
> 
> 
> 
> -- 
> 
> 
> ======================================================================
> Stefanos Karasavvidis,  Electronic & Computer Engineer, M.Sc.
> e-mail: s... at isc.tuc.gr, Tel.: (+30) 2821037508, Fax: (+30) 2821037520
> Technical University of Crete, Campus, Building A1



More information about the Python-list mailing list