[issue8054] "as_string" method in email's mime objects encode text segmentedly

Fri Mar 5 15:14:08 CET 2010

R. David Murray <rdmurray at bitdance.com> added the comment:

We don't fully support setting defaultencoding to anything other than ASCII.  The test suite doesn't fully pass, for example, if defaultencoding is set to 'utf-8' in site.py.

But that aside, the documentation for MIMEText says: "No guessing or encoding is performed on the text data.".  In your first example you are passing it unicode, which is un-encoded.  It might be helpful if it threw a ValueError when passed unicode, but it isn't technically a bug that it doesn't, since it does throw an error if you haven't changed defaultencoding.  The behavior also can't be changed, since existing code may be depending on being able to pass ascii-only unicode strings in and having them auto-coerced to ascii.

Note that the cause of the problem is the fact that the email transport encoder is assuming that the input is binary data and is breaking it up into appropriately sized lines by counting bytes.  You've fed it a unicode string, which it then winds up breaking up by *unicode* character count, then passing the lines to binascii.b2a_base64, which given the non-standard defaultencoding then coerces it to utf-8, which contains a number of bytes different from the original character count, which are then encoded in base64, giving you the uneven length lines in the final output.

In Python3 this isn't a problem, since you can't accidentally mix up unicode and bytes in Python3.

----------
resolution:  -> wont fix
stage: test needed -> committed/rejected
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8054>
_______________________________________