[Python-3000] Questions about email bytes/str (python 3000)

Bill Janssen janssen at parc.com
Wed Aug 15 03:44:54 CEST 2007


> > Let's take an example: multipart (MIME) email with latin-1 and  
> > base64 (ascii)
> > sections. Mix latin-1 and ascii => mix bytes. So the best type  
> > should be
> > bytes.
> >
> > => bytes
> 
> Except that by the time they're parsed into an email message, they  
> must be ascii, either encoded as base64 or quoted-printable.  We also  
> have to know at that point the charset being used, so I think it  
> makes sense to keep everything as strings.

Actually, Victor's right here -- it makes more sense to treat them as
bytes.  It's RFC 821 (SMTP) that requires 7-bit ASCII, not the MIME
format.  Non-SMTP mail transports do exist, and are popular in various
places.  Email transported via other transport mechanisms may, for
instance, use a Content-Transfer-Encoding of "binary" for some
sections of the message.  Some parts of the top-most header of the
message may be counted on to be encoded as ASCII strings, but not the
whole message in general.

> > About base64, I agree with Bill Janssen:
> >  - base64MIME.decode converts string to bytes
> >  - base64MIME.encode converts bytes to string
> 
> I agree.
> 
> > But decode may accept bytes as input (as base64 modules does): use
> > str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict').
> 
> Hmm, I'm not sure about this, but I think that .encode() may have to  
> accept strings.

Personally, I think it would avoid more errors if it didn't.  Let the
user explicitly encode the string to a particular representation
before calling base64.encode().

Bill


More information about the Python-3000 mailing list