[Email-SIG] email package status in 3.X

Thu Jun 10 16:18:48 CEST 2010

On Thu, 10 Jun 2010 09:21:52 -0400, lutz at rmi.net wrote:
> In other words, some of my concern may have been a bit premature.  
> I hope that in the future we'll either strive for compatibility 
> or keep the current version around; it's a lot of very useful code.

The plan is to have a compatibility layer that will accept calls based
on the old API and forward appropriately to the new API.  So far I'm
thinking I can succeed in doing this in a fairly straightforward manner,
but I won't know for sure until I get some more pieces in place.

> In fact, I recommend that any new email package be named distinctly, 

I'm going to avoid that if I can (though the PyPI package will be
named email6 when we publish it for public testing).  If, however,
it turns out that I can't correctly support both the old and the
new API, then I'll have to do that.

> and that the current package be retained for a number of releases to
> come.  After all the breakages that 3.X introduced in general, doing
> the same to any email-based code seems a bit too much, especially 
> given that the current package is largely functional as is.  To me,
> after having just used it extensively, fixing its few issues seems 
> a better approach than starting from scratch.

Well, the thing is, as you found, existing 2.x code needs to be fixed to
correctly handle the distinction between strings and bytes no matter what.
The goal is to make it easier to write correct programs, while providing
the compatibility layer to make porting smoother.  But I doubt that any
non-trivial 2.x email program will port without significant changes,
even if the compatibility layer is close to 100% compatible with the
current Python3 email package, simply because the previous conflation
of text and bytes must be untangled in order to work correctly in
Python3, and email involves lots of transitions between text and bytes.

As for "starting from scratch", it is true that the current plan involves
considerable changes in the recommended API (in the direction of greater
flexibility and power), but I'm hoping that significant portions of the
code will carry forward with minor changes, and that this will make it
easier to support the old API.

> As far as other issues, the things I found are described below my
> signature.  I don't know what the utf-8 issue is that you refer 
> too; I'm able to parse and send with this encoding as is without 
> problems (both payloads and headers), but I'm probably not using the
> interfaces you fixed, and this may be the same as one of item listed.

It is, see below.

> Another thought: it might be useful to use the book's email client 
> as a sort of test case for the package; it's much more rigorous in 
> the new edition because it now has to be given 3.X'Unicode model 
> (it's abut 4,900 lines of code, though not all is email-related).
> I'd be happy to donate the code as soon as I find out what the 
> copyright will be this time around; it will be at O'Reilly's site
> this Fall in any event.

That would be great.  I am planning to write my own sample ap to
demonstrate the new API, but if I can use yours to test the compatibility
layer that will help a lot, since I otherwise have no Python3 email
application to test against unless I port something from Python2.

> Major issues I found...
> ------------------------------------------------------------------
> 1) Str required for parsing, but bytes returned from poplib
> 
> The initial decode from bytes to str of full mail text; in 
> retrospect, probably not a major issue, since original email 
> standards called for ASCII.  A 8-bit encoding like Latin-1 is
> probably sufficient for most conforming mails.  For the book,
> I try a set of different encodings, beginning with an optional
> configuration module setting, then ascii, latin-1, and utf-8;
> this is probably overkill, but a GUI has to be defensive.

This works (mostly) for conforming email, but some important Python email
applications need to deal with non-conforming email.  That's where the
inability to parse bytes directly really causes problems.

> 2) Binary attachments encoding
> 
> The binary attachments byte-to-str issue that you've just
> fixed.  As I mentioned, I worked around this by passing in a 
> custom encoder that calls the original and runs an extra decode
> step.  Here's what my fix looked like in the book; your patch 
> may do better, and I will minimally add a note about the 3.1.3
> and 3.2 fix for this:

Yeah, our patch was a lot simpler since we could fix the encoding inside
the loop producing the encoded lines :)

> 3) Type-dependent text part encoding
> 
> There's a str/bytes confusion issue related to Unicode encodings
> in text payload generation: some encodings require the payload to
> be str, but others expect bytes.  Unfortunately, this means that 
> clients need to know how the package will react to the encoding 
> that is used, and special-case based upon that.  

This was the UTF-8 bug I fixed.  I shouldn't have called it "the UTF-8
bug", because it applies equally to the other charsets that use base64,
as you note.  I called it that because UTF-8 was where the problem was
noticed and is mentioned in the title of the bug report.

I had a suspicion that the quoted-printable encoding wasn't being done
correctly either, so to hear that it is working for you is good news.
There may still be bugs to find there, though.

So, in the next releases of Python all MIMEText input should be string,
and it will fail if you pass bytes.  I consider this as email previously
not living up to its published API, but do you think I should hack
in a way for it to accept bytes too, for backward compatibility in the
3 line?

> There are some additional cases that now require decoding per mail 
> headers today due to the str/bytes split, but these are just a 
> normal artifact of supporting Unicode character sets in general,
> ans seem like issues for package client to resolve (e.g., the bytes 
> returned for decoded payloads in 3.X didn't play well with existing 
> str-based text processing code written for 2.X).

I'm not following you here.  Can you give me some more specific
examples?  Even if these "normal artifacts" must remain with
the current API, I'd like to make things as easy as practical when
using the new API.

Thanks for all your feedback!

--David