[Email-SIG] API thoughts

Tue Mar 1 23:59:10 CET 2011

On Tue, 01 Mar 2011 13:58:50 -0800, Glenn Linderman <v+python at g.nevcal.com> wrote:
> To support reading byte-stream HTTP headers, therefore, it is critical 
> that the email API accept an encoding from the application which "knows" 
> the encoding; presently cgi.py has to pre-decode incoming headers 
> because email does not have such a parameter.  On the other hand, maybe 
> cgi.py shouldn't use email header parsing at all... since browsers don't 
> use RFC 2047 encoding in practice, the parsing of headers without such 
> is straightforward.

I think it could make sense for the default input character set to be
a policy parameter for the parser.  Maybe not in the first version,
though :)

Yes, it is simple(r) to parse headers if you don't have to worry about
RFC2047, but why duplicate code if you don't need to?  This assumes,
of course, that email6 does what cgi.py and similar programs need,
but I'll try to keep my eye on that.

> Further, HTTP data streams can be extremely large, and thus 
> time-consuming to obtain over the wire.  CGI applications cannot afford 
> to keep large blocks of data in RAM during receipt, thus if email wishes 
> to support CGI, it needs features for placing large blocks of data on 
> disk instead of in RAM during the parsing phase; cgi.py presently has to 
> preparse headers, to separate them from the data streams, which it then 
> handles on its own, because of this issue.

It is already in the plan to add disk caching support to the base email
API, so this will get addressed.  You may even be the one who suggested
designing the API as a general "storage" API so that different back-ends
can be hooked up.  In any case, that's what I've got in mind.

> There is, by the way, room for improvement in the cgi.py handler for 
> HTTP data streams; presently all large MIME objects are written to disk 
> (but small ones are kept as string or byte streams), but it isn't 
> necessarily the right disk, and the data must then be again copied, byte 
> by byte, to its final file system location.  I see that as abhorrent 
> overhead.  There is presently no provision for hooks that ask the CGI 
> application what to do with the data being received, while it is being 
> received, nor for policies to assist with better heuristics, with the 
> goal in mind that a properly and completely received MIME object could 
> then be renamed to its final location rather than copied.

I think the hookable storage back end addresses this, but the concrete
implementation (eventually) provided by email ought to support it as well.

> > I guess I'm proposing, then, that there be an API version definition,
> > with two values as of Python3.3: email5 API, and email6 API.  We'll
> > figure out how we name and interrogate these formally later.
> 
> Question: While it is pretty clear that enhanced behaviors are required 
> to benefit new applications that use email, and while some new APIs may 
> be incompatible with some existing APIs, might it be possible to design 
> the new API, and then build a compatibility layer that looks like the 
> old API on top?  Such that there would be policies for the new APIs that 
> would work like the old APIs to ease the implementation of such a 

Yes, this is what was behind my comment that I had further ideas
about backward compatibility.  One way is what Barry and I already
discussed:  a wrapper to put around an email6 object that would support
the email5 API.  Another approach is to have the email6 message itself
support the legacy API.  I haven't looked at every method, but most
of them would be supportable.  The tricky bit is headers:  an email6
Message will return Header objects, whereas an email5 application will
generally expect to get strings.  (It shouldn't!  But many will.  Even the
email package itself expects to get strings when it accesses headers.)
My wild thought at this point is:  what if Header subclassed string?
With the exception of a few structured headers such as address headers,
this might actually work pretty well.  But experimentation with some
at least semi-real-world examples would be needed to prove out the
concept.

> layer?  I'm not sure I fully understand the use of _factory or factory 
> parameters, but for APIs that have _factory and grow a factory, could 
> not the presence of which parameter imply any variant functionality?

I'm not sure what you are asking here.  In what I outlined for the parser
API, you'd get an email5-API object if you used _factory or nothing,
and and email6 API object if you used factory, so yes, in that sense
the parameter determines the API.  But what about a library that is
accepting a Message object?  It needs a way to detect whether or not
it has been passed an email5 API message, or an email6 one.

> (OK, this question comes after not looking at the email API during all 
> the GSOC and your implementation efforts since the last big round of 
> discussion, but your proposals here seem to sound like it would be more 
> possible with your current thinking that with your previous thinking.)

Well, in my previous thinking I was intending on doing much the same thing
as far as backward compatibility went (having a policy that provided an
email5 compatible object), I just hadn't talked about it much :)  The
biggest difference now is that email5 will be the default, at least in
the Python3.3 release.

> Consider me an interested observer; I'll enjoy reading, thinking, and 
> commenting about these ideas too, but sadly am unlikely to implement an 
> email client this year :(  But I have aspirations to do so, because none 
> of the existing email clients exactly suit my preferences... (everyone 
> should write an editor and an email client, no?  I've done the former 
> several times... what I want, though, is emacs-python, instead of 
> emacs-lisp).

Thanks for your attention and comments.  I haven't implemented an editor
yet (VIM + Python has been good enough so far), but I have implemented
parts of an email client, and intend to finish that project as part of
working on email6, as an API test bed.

--David