[Email-SIG] Email6 repository, and policy framework first draft

Tue Mar 29 01:39:21 CEST 2011

I've set up the feature branch for email6:

    http://hg.python.org/features/email6

The branch inside the repo is email6.  I'll probably wind up having
subbranches unless my proposals get approved quickly :)

So far I've checked in the first draft of my proposal for the policy
framework.  I've blogged about this:

    http://www.bitdance.com/blog/2011/03/28_01_Policy_Framework_First_Draft/

Here's the text version of the blog post:

2011-03-28 Policy Framework First Draft
=======================================

Last week turned out to be mostly about tests and bugs.  As per my last
post, I moved the tests into a test package.  Then I went on to add a
bunch of `additional tests`_ developed by Michael Henry at the PyCon sprints.
More tests are always good before starting to modify code, right?

.. _additional tests: http://bugs.python.org/issue11589

Michael's tests had revealed a couple bugs, though, so I then went on to
apply the `fix`_ for those bugs, which included a `rewritten algorithm`_
for encoding strings as quoted printable.  I adapted the algorithm
proposed by Michael, then discovered a different and probably `better
algorithm`_ had already been proposed a while back and gotten lost in the
tracker.  That proposed patch was against the email package in Python2,
though, and the corresponding code in Python3 has a different interface,
so the patch wasn't easily adapted.  Since there are other changes
that need to be made to the quoted printable encoder, I have deferred
implementing the better algorithm until I get as far as touching that
code for the email6 work.

.. _fix: http://bugs.python.org/issue11590
.. _rewritten algorithm: http://bugs.python.org/issue11606
.. _better algorithm: http://bugs.python.org/issue5803

There was also a `bug`_ in the Email5 API that I wanted to fix before
starting to make API changes.  When you deal with "dirty" headers in
Email5.1, you may get back a ``Header`` object when querying a header.
Now, the normal way to deal with crazy headers in Email5 is to pass them
to ``decode_header`` to get the pairs of character sets and original bytes
from the wire out.  But ``decode_header`` wasn't accepting a ``Header``
object for ``decoding``.  My first approach was to try shifting back to
returning strings even when the header was "dirty", by wrapping them up
in encoded words with the ``unknown-8bit`` charset.  That more or less
worked, but doing it that way would mean making some other changes
to methods such as ``get_param`` to handle headers that had gotten
re-encoded into encoded words.  This was far from optimal.  The reporter
of the bug pointed out that I had carefully documented that ``Message``
would return a ``Header`` if the source header had unencoded non-ASCII
bytes in it, which made changing this behavior in a bug fix release
a non-starter.  So I gave in and just fixed ``decode_header`` to handle
``Header`` objects.  Since *all* headers in email6 will be a (new type of)
``Header`` object, programmers may as well get used to dealing with them.

.. _bug: http://bugs.python.org/issue11584

For email6 itself, there is now a `feature branch`_ where I will do
the patch development for email6 before applying the changes to the
main cpython repository.  The branch is named ``email6``, of course.
Anyone may browse or clone this repository to take a look at the current
state of development.

.. _feature branch: http://hg.python.org/features/email6

And that current state is that I have checked in the first draft of
the Policy framework.  This consists of a new module, `policy.py`_,
the associated documentation, `policy.rst`_, and a set of tests,
`test_policy.py`_

.. _policy.py: http://hg.python.org/features/email6/file/email6/Lib/email/policy.py
.. _policy.rst: http://hg.python.org/features/email6/file/email6/Doc/library/email.policy.rst
.. _test_policy.py: http://hg.python.org/features/email6/file/email6/Lib/test/test_email/test_policy.py

The basic idea is that a ``Policy`` object is an immutable container
for a bunch of attributes and callback hooks.  You can call a ``Policy``
object to get a new one with some of the defaults changed.  And you can
add them together, with the non-default settings from the right operand
overriding those from the left operand.

So far we have policies such as:

    * default
    * SMTP
    * HTML
    * Strict

*default* may get renamed *email6*. I'd prefer 'default', since that's
what I'd like it to be by the time we get to Python 3.4.  The actual
default policy when I start adding the parameter to other classes and
functions will be *email5*, though, so the name *default* for email6 is
probably not going to work.

The *SMTP* policy is just like default, but generates "wire format" line
separators (``\r\n``).  *HTML* is like *SMTP*, but does not wrap headers.
*Strict* sets a flag that will (once I implement it) cause the parser to
raise errors when it encounters defects instead of just keeping track
of them.  Using *Strict* is where you can see the utility of adding
policies together::

    >>> StrictSMTP = SMTP + Strict

You could use StrictSMTP to parse an incoming SMTP message where you
wanted your program to blow up if the message was invalid.  (When would
you ever want that?  I don't know, but someone probably will!).

So far I've only defined one hook, ``register_defect``.  You could
subclass ``Policy`` and define your own ``register_defect`` method that
would, say, log all defects to a log file, thus giving you some idea of
the quality of the email being processed by your program, even if you
did nothing else with the defect info.

Now we'll see what the Email SIG thinks of this implementation, and
meanwhile I'll be adding policy arguments to the parser and generator
classes.