[Python-Dev] Email6 status (was Open PEPs and large-scale changes for 3.3)

R. David Murray rdmurray at bitdance.com
Tue May 1 16:40:08 CEST 2012


On Tue, 01 May 2012 13:57:50 +0200, Georg Brandl <g.brandl at gmx.net> wrote:
> Other planned large-scale changes:
> 
> * Addition of the "regex" module
> * Email version 6

I guess it's time to talk about my plans for this one :)

RIM/QNX is currently paying me to work on their stuff rather than email6,
(but it does leave me with some time for email6).  However, while QNX
directly funded a big chunk of email6, as a consequence of their current
priorities the whole of the email6 spec isn't going to be implemented
for Python3.3.

There is, however, a very useful big chunk of it that is pretty much done:
the improved header parsing, header API, and header folding.  I covered
the primary improvements in my PyCon talk, for those who were there or
have seen the video.

Even that is not quite complete, but I'm currently planning to finish
it before alpha 4. (There may be a couple of details that won't make it
in until beta1.)

At the PyCon sprints I finished the folding implementation.  It's every
bit as ugly as the old folding implementation that I simplified some
time ago, but it gets a lot more corner cases right, and implements an
important feature that the old folding algorithm got wrong more often
than not: folding at "higher level syntactic breaks".  So while I'd like
to revisit that code and improve it, it *works*.  So any further work
on that can be bug-fix stage.

Also at the sprints I started on a performance refactoring.  It has been
bothering me for a while that any program using the new code would have
been doing a complete RFC5322 parse on every header in every message,
even if it was processing a boatload of messages, only cared about the
content of a few headers, and wanted to just pass the rest through.
I was treating fixing that as a premature optimization, though I had
some thoughts about how to do so.

Well, to my great surprise, the most logical way of fixing it turned out
to have two significant benefits: the code got simpler, and it provides
a way to maintain pretty much 100% backward compatibility with Python3.2.
I guess some optimizations aren't premature.

The basic scheme (which I have almost completely implemented in the email6
feature repo at this point) is to continue to store the raw data from
a parse in the Message just like we always have, and only do the full
RFC5322 parse when either an application program asks for the header,
or a generator needs to re-fold that header for some reason.  By setting
the policy controls appropriately and being aware of the consequences
of looking at a header, an application could take advantage of the new
header parsing for headers of interest with minimal performance impact
compared to 3.2.

Now, here's the tricky bit.  The new API for headers has been out on PyPI
for review for almost a year now, but hasn't seen what you would call
widespread use.  In particular, I haven't gotten any feedback about it.
It seems to me that introducing this new API in 3.3 would be a perfect
application of PEP 411...except that email is already a package in the
standard library.

This is where the backward-compatibility of my performance refactor
comes in.  The way this works is that the policy object, which has already
been added to the 3.3 codebase and *has* gotten some review and feedback,
controls what happens to the headers.  The way the code in the 'nemail6'
branch of /features/email6 currently works is that the policy used by
default is named 'compat32'.  (Actually it's compat5 right now in the
repository, but I plan to change the name today.)  That policy implements
the exact same header handling that 3.2 currently uses (bugs and all).

The new header handling is introduced by any *other* pre-defined policy
an application may select.  Thus, if code is not changed to use one
of the new named policies, nothing changes and we have full backward
compatibility.  If a policy is specified, then the new header handling
code (and the API it provides) is used.

What I'm currently preparing is two patches.  The first patch will
refactor the policy code that was already committed so that the above
scheme can be implemented, and so that compat32 is the default policy
for 3.3.  (This is the 'nemail6base' branch in /features/email6.)

The second patch will use the policy hooks introduced by the first
patch to add the new policies that use the new header parsing/folding
code.

My plan is that the first patch will go into 3.3 regardless (and should
be ready for review/commit soon).

What I'd like to do is have the second patch introduce the new policies
as *provisional policies*.  That is, in the spirit but not the letter
of PEP 411, I'd like the new header API to be considered provisional
and subject to improvement in 3.4 based on what we learn by having it
actually out there in the field and getting tested.

--David


More information about the Python-Dev mailing list