Another 2 to 3 mail encoding problem

Richard Damon Richard at Damon-Family.org
Thu Aug 27 09:01:11 EDT 2020


On 8/27/20 4:31 AM, Chris Green wrote:
> While an E-Mail body possibly *shouldn't* have non-ASCII characters in
> it one must be able to handle them without errors.  In fact haven't
> the RFCs changed such that the message body should be 8-bit clean?
> Anyway I think the Python 3 mail handling libraries need to be able to
> pass extended characters through without errors.

Email message a fully allowed to use non-ASCII characters in them as
long as the headers indicate this. They can be encoded either as raw 8
bit bytes on systems that are 8-bit clean, or for systems that are not,
they will need to be encoded either as base-64 or using quote-printable
encoding. These characters are to interpreted in the character set
defined (or presumed) in the header, or even some other binary object
like and image or executable if the content type isn't text.

Because of this, the Python 3 str type is not suitable to store an email
message, since it insists on the string being Unicode encoded, but the
Python 2 str class could hold it.

-- 
Richard Damon



More information about the Python-list mailing list