[ python-Bugs-1555842 ] email package and Unicode strings handling

SourceForge.net noreply at sourceforge.net
Sun Sep 10 19:35:46 CEST 2006


Bugs item #1555842, was opened at 2006-09-10 16:04
Message generated for change (Comment added) made by manlioperillo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1555842&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Manlio Perillo (manlioperillo)
Assigned to: Nobody/Anonymous (nobody)
Summary: email package and Unicode strings handling

Initial Comment:
The support for Unicode strings in the email package
(notably MIMEText and Header class) is not uniform.

The behaviour with Unicode strings in Header is
documented but the interface is not good.

This code works, but it should not:

>>> h = Header.Header(u"àèìòù", charset="us-ascii")
>>> m = Message.Message()
>>> m["Subject"] = h
>>> print m.as_string()


Allowing this to work can cause confusion, I'm saying
that the charset is us-ascii, not utf-8.

With MIMEText I obtain:

m = MIMEText.MIMEText(u"àèìòù", _charset="us-ascii")
>>> print m.as_string()

[ exception ]


I think that the correct behaviour (for all functions
accepting strings) is:

- Do not accept plain str strings (8-bit).
  Accept only if they are plain ascii (7-bit).
- The charset specified should not be considered an 
  hint, but the charset I want to be used.



Regards  Manlio Perillo

----------------------------------------------------------------------

>Comment By: Manlio Perillo (manlioperillo)
Date: 2006-09-10 17:35

Message:
Logged In: YES 
user_id=1054957

The last example is not right.
Here is the correct one:

 >>> m = MIMEText.MIMEText(u"àèìòù", _charset="utf-8")
 
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python2.4\lib\email\MIMEText.py", line 28, in
__init__
    self.set_payload(_text, _charset)
  File "C:\Python2.4\lib\email\Message.py", line 218, in
set_payload
    self.set_charset(charset)
  File "C:\Python2.4\lib\email\Message.py", line 260, in
set_charset
    self._payload = charset.body_encode(self._payload)
  File "C:\Python2.4\lib\email\Charset.py", line 366, in
body_encode
    return email.base64MIME.body_encode(s)
  File "C:\Python2.4\lib\email\base64MIME.py", line 136, in
encode
    enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode characters in
position 0-2: ordinal not in range(128)


So it seems that email.Message does not handle Unicode strings.

The code works if I set the charset to latin-1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1555842&group_id=5470


More information about the Python-bugs-list mailing list