[Email-SIG] fixing the current email module

Thu Oct 8 09:16:47 CEST 2009

I'd like to try to summarize what I understand Barry to be saying (which,
in this case, also reflects my understanding of what is needed), and
see if I'm anywhere close to on target :)  In the following discussion,
'text' refers to unicode data, and bytes refers to, well, bytes.  (I
chose to use 'text' instead of 'string' to avoid confusion).

The email package consists of two major conceptual pieces: the API, and
the internal data model.  The API needs to have facilities for accepting
data in either text format or bytes format, and this data is used to
generate a model of the input message (a Message).  Likewise the API needs
to provide facilities for serializing a Message as either bytes or text.
The API also provides ways to build up a Message from pieces, or to
extract information from a Message in pieces, and to modify a Message,
and again input and output as both text and bytes must be supported.

The data model used by the email package is an "implementation detail",
and we should not spend effort at this stage trying to optimize it for
anything except memory requirements with respect to potentially large
sub-objects, and even there it is more a matter of providing ways to
deal with potentially large sub-objects than it is a true optimization.
In general correctness and robustness is much more important than speed.

The data model will need to be a practical hybrid of the input data,
possibly transformed in some way in some cases, and various sorts of
meta-data.  The current email package already works this way.

An important characteristic of the model is that it be idempotent whenever
sensible; that is, if a given byte stream is used to create a Message
or subobject, serializing that Message or subobject as bytes should
return the original byte stream whenever sensible (ie: when the data
is not pathologically malformed).  Likewise if a text stream is used to
create a Message or subobject, serializing it as text should produce,
whenever sensible, the original text stream.  In particular, well-formed
(per RFC) message data should always be stored and produced
idempotently.

An important property of the API is that both the parser that transforms
an input stream into a Message and Message serialization should not raise
exceptions except in the face of errors that leave no way to produce a
valid Message or serialization.  Instead a defects list is maintained
and exposed through the API.  In the face of some defects it may not be
sensible to maintain idempotency.

The APIs that manipulate the data model either for piecewise construction
or for transformations may raise exceptions, and in most cases _should_
raise exceptions when encountering invalid data or operations.

Also, as an additional note to those thinking about use cases, I'd
like to point out something I know well and which Barry reminded me
about recently:  parts of the email package (eg: MIME and RFC822-style
header parsing) are used or can be used by systems other than systems
handling email.  The particular cases I have run into myself are working
with non-email data files that follow RFC822 rules, and handling data
from NNTP (which, granted, is almost email...but only almost).  In
the former case you usually have text input and output, mediated
by the encoding of the file(s) on disk.  In the latter case you have
all the problems of email plus a few more.

Further, in the standard library the http package, urllib, the cgi
module, and pydoc are all clients of the email package.

--David (RDM)