[Email-SIG] fixing the current email module

Sun Oct 11 07:49:25 CEST 2009

On approximately 10/10/2009 5:47 PM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> Glenn Linderman writes:
>  > On approximately 10/10/2009 8:40 AM, came the following characters from 
>  > the keyboard of Stephen J. Turnbull:
>
>  > > So why are we discussing this?  We don't even know what our mainline
>  > > APIs are going to look like, why are we discussing forcibly operating
>  > > on broken input?
>  > 
>  > Use case generation.  If the only way to access header values is to 
>  > successfully, fully, decode them, then some uses may be rendered 
>  > impossible, or at least difficult, even by choice of APIs.
>
> Since invertibility is a requirement, "successfully fully decoding" a
> header field is not a prerequisite to accessing it.
>
> The question of "what should we do about broken mail" at this point
> has three components:
>
> (1) To what level do we (ie, the email module) promise to parse
>     conforming wire format into useful objects?
>
> (2) For nonconforming input, when is it OK to raise an error and
>     return to the calling client rather than handle it ourselves?
>
> (3) What is the API for accessing and/or mutating unparsed data, and
>     requesting a reparse?
>
> I don't think we should go any farther than that.
>   

I agree with your three components; but I think the answer to (3) 
requires discussion/speculation of what clients might want to to when 
faced with errors, otherwise the API won't likely help them much, 
without reimplementing email package logic.  It is easy to design 
"sufficient", but unhelpful, APIs.  So I've been willing to discuss such 
things.  Maybe at too much length, and maybe with insufficient clarity 
that that is what I'm discussing, for which I apologize.  But I don't 
think that not discussing it helps to answer (3).

>  > > "Re" is a Latin abbreviation; there is no appropriate translation. ;-)
>  > >   
>  > 
>  > Nonetheless, I have seen both Re: and Fwd: translated to other languages 
>  > (besides Latin or geek) :)
>
> Sure.  This is an aspect of question (1): is this the responsibility
> of the email module?
>   

I don't think the old RFCs even discuss the use of Re: and Fwd:, nor 
whether they should be collapsed or translated, or even used at all.  
Just checked: RFC 822 had an example that showed Re:, but RFC 2822 does 
discuss it a bit, and suggests not adding duplicate Re:.  Fwd: is not 
mentioned at all, in those two RFCs.  So no, adding and collapsing 
Re:/Fwd: is not the responsibility of the email package.  But making it 
easy to do so, might be, as it is a common client operation.  Lots of 
email style guides discuss it.

>  > > Maybe they are, but the email module doesn't know or care about what
>  > > they do.  Let's stick within what the email module is supposed to
>  > > handle
>  > 
>  > Yep, this is just use case exploration.
>
> But since by definition this is broken input, discussing what
> applications are going to want to do with it is inappropriate, IMO.
> We don't care if the app is going to prefix, suffix, or crucifix it.
> We need to specify
>
> (a) what object will hold the raw data we couldn't handle
> (b) how a calling client can retrieve the raw data
> (c) how the client can replace (or more generally mutate) that data
> (d) how the client can request a reparse from us if it attempted to
>     repair the breakage at a low level rather than parse it
>
> Manipulations of text or bytes are in principle not the responsibility
> of the email module IMO; that will be done *by* the client *using* raw
> Python, not methods provided by email.  I don't see how discussion of
> *what* manipulations can be done with one hand up our nose is anything
> but useless bikeshedding.
>
> If we decide that the email module can usefully provide sufficiently
> general facilities that would be convenient and hard to implement by
> general client programmers (eg, the Mailman Developers collective
> wisdom about foreign equivalents for "re" and "fwd" is surely greater
> than that of the average American programmer), we will do it by
> calling low-level methods to get and put the data, and raw Python to
> manipulate it as text or bytes

Except it may be perfectly valid input using a standard that post-dates 
the application.  Doing something reasonable with it is appropriate.  
The email RFCs go to great lengths to make new features work reasonably 
in old clients that have limited understanding; with fallback 
interpretations for unknown MIME subtypes and even MIME types, and 
ensuring that some type of reasonable interpretation might be done.  The 
RFCs define ways that new MIME types and subtypes might be defined, and 
new charsets, it seems reasonable to attempt to accommodate the 
possibility that such may actually be defined in the future.

If we don't discuss some of the possibilities, we'll never learn enough 
to "decide that the email module can usefully provide sufficiently 
general facilities that would be convenient and hard to implement by 
general client programmers" :)

To me, "hard" would mean that they would have to rewrite portions of 
logic that already exists in the email package, and then tweak it 
slightly to compensate for not-quite-perfect data, or maybe I should 
switch to saying "not-quite-perfect-or-possibly-later-standardized data" :)

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking