[Email-SIG] fixing the current email module

Fri Oct 9 00:39:23 CEST 2009

On approximately 10/8/2009 6:00 AM, came the following characters from 
the keyboard of Barry Warsaw:
> On Oct 8, 2009, at 3:29 AM, Glenn Linderman wrote:
>> And I agree that APIs to retrieve any MIME part as undecoded bytes is 
>> appropriate; and to retrieve it as decoded strings is appropriate for 
>> text MIME parts.  Not sure that non-text MIME parts need to support 
>> being returned as strings.
>
> I hate to open another can of worms, but I've been thinking about this 
> a lot too :).  It's been discussed on list before, so nothing new 
> here.  I think the parser and MIME classes need to be hookable for 
> decoding their contents.  For example, if you have a text/* it might 
> well make sense to support bytes() and str()/unicode() on the part 
> instance.  But if it's image/* str() makes no sense.  part.decode() or 
> something similar makes sense, but this needs to be extensible because 
> the email package will not know how to convert every content-type.  At 
> best it will only know how to decode content-types that Python's 
> stdlib knows about.

Seems like the following should be obtainable from a MIME parts:

1) wire format.  Either what came in, in the parser case, or what would 
be generated.
2) internal headers from the MIME part
3) decoded BLOB.  This means that quopri and base64 are decoded, no more 
and no less.  This is bytes.  No headers, only payload.  For 
Content-Transfer-Encoding: binary, this is mostly a noop.
4) text/* parts should also be obtainable as str()/unicode(), payload 
only.  This is where charset decoding is done.

I think your talk in the next paragraph about hooks and other object 
types being produced is a generalization of 4, not 3, and generally no 
additional decoding needs to be done, just conversion to the right 
object type (or file, or file-like object).

> The problem is that if the bytes came off the wire, the parser 
> currently can only attach the most basic MIME base class.  It doesn't 
> know that an image/png should create a MIMEImagePNG instance there.  
> This is different from hacking the model directly because the 
> application can instantiate the right class.  So the parser either has 
> to have a hookable way for an application to go from content-type to 
> class, or the generic MIME base class needs to be hookable in its 
> .decode() method. 

So either the email package can stop at 3, and 4 only for text/* parts, 
or it could learn more types (registered types, with well-defined 
corresponding objects could be potentially built-in to the email 
package), and/or it could become hookable for application types.  Of 
course, for disposition to files, storing the BLOB in a file of the 
right name is adequate... to avoid the file, I agree that converting to 
a useful object type is handy.  But maybe file-like objects would 
suffice, for most of the types.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking