[Email-SIG] Thoughts on the general API, and the Header API.

Barry Warsaw barry at python.org
Sat Feb 20 03:23:52 CET 2010


On Jan 25, 2010, at 03:10 PM, R. David Murray wrote:

>After setting it aside for a bit, I had what I think is a little epiphany:
>our need is to deal with messages (and parts of messages) that could be
>in either bytes form or text form.  The things we need to do with them
>are similar regardless of their form, and so we have been talking about a
>"dual API": one method for bytes and a parallel method for text.
>
>What if we recognize that we have two different data types, bytes messages
>and text messages?  Then the "dual API" becomes a more uniform, almost
>single, API, but with two possible underlying data types.

I really like this, especially because it kind of mirrors the transformations
between bytes and strings.  I have one suggestion that might clean up the API
and make some other things possible or easier.

>In the context specifically of the proposed new Header object, I propose
>that we have a StringHeader and a BytesHeader, and an API that looks
>something like this:
>
>StringHeader
>
>    properties:
>        raw_header (None unless from_full_header was used)
>        raw_name
>        raw_value
>        name
>        value
>
>    __init__(name, value)
>    from_full_header(header)
>    serialize(max_line_len=78,
>              newline='\n',
>              use_raw_data_if_possible=False)
>    encode(charset='utf-8')
>
>BytesHeader would be exactly the same, with the exception of the signature
>for serialize and the fact that it has a 'decode' method rather than an
>'encode' method.  Serialize would be different only in the fact that
>it would have an additional keyword parameter, must_be_7bit=True.

The one thing that I think is unwieldy is the signature of the serialize() and
deserialize() methods.  I've been thinking about "policy" objects that can be
used to control formatting and I think that perhaps substituting an API like
this might work:

serialize(policy=None)
deserialize(policy=None)

The idea is that the policy object would describe how and when to fold header
lines, what EOL characters to use, but also such choices such as whether to
use raw data if possible, and must_be_7bit.  A first order improvement is that
it would be much easier to pass the policy object up and down the call stack
than a slew of independent parameters.

Further, it might be interesting to allow policy objects in the generator,
which would control default formatting options, and on Message objects in the
hierarchy which would control formatting for that Message and all the ones
below it in the tree (unless overridden by a policy object on a sub-message).
Maybe headers themselves also support policy objects.

I think this could be interesting for supporting output of the same message
tree to different destinations.  E.g. if the message is being output directly
to an SMTP server, you'd stick a policy object on there that had the RFC 5321
required EOL, but you'd have a different policy object for output to a web
server.

>(Encoding or decoding a Message would cause the Message to recursively
>encode or decode its subparts.  This means you are making a complete
>new copy of the Message in memory.  If you don't want to do that you
>can walk the Message and convert it piece by piece (we could provide a
>generator that does this).)

It sounds like there's overlap between the encoding/decoding API and the
serialize/deserialize API.  Are you thinking along those lines?  Differences
in signature could be papered over with the policy objects.

>Subclasses of these classes for structured headers would have additional
>methods that would return either specialized object types (datetimes,
>address objects) or bytes/strings, and these may or may not exist in
>both Bytes and String forms (that depends on the use cases, I think).

Is it crackful to think about the policy object also containing a MIME type
registry for conversion to the specialized object types?

>So, those are my thoughts, and I'm sure I haven't thought of all the
>corner cases.  The biggest question is, does it seem like this general
>scheme is worth pursuing? 

Definitely!  I think it's a great idea.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20100219/5dca7193/attachment.pgp>


More information about the Email-SIG mailing list