[Email-SIG] rfc822 parser (the elephant has landed)

Barry Warsaw barry at python.org
Wed Jun 8 22:48:50 CEST 2011


On Jun 08, 2011, at 02:28 PM, R. David Murray wrote:

>Things have been a bit disrupted in my life over the past month (a family
>tragedy).

I'm very sorry to hear this David.  My thoughts are with you.

As always, thanks for your amazing work on email6.  You are my hero.
Comments:

* Changing the __setitem__ API.  I've always thought about this as a pure
  convenience, and that appending was the most convenient semantics.  Other
  methods, e.g. replace_header() should be included to provide the range of
  semantics that people want.  Then we'd just pick one and alias it to
  __setitem__.  I'm mixed as to whether appending still is the most convenient
  alias, since in my own code I often `del msg[header]; msg[header] = foo`.
  But that also changes the header order so it's not a perfect replacement.

* Unique headers: is this controlled or influenced by a policy?  For example,
  duplicate Subjects might be disallowed by RFC 5322, but could conceivably be
  allowed (or at least not prohibited) by other email-like protocols.

  Also, while some fields like CC allow only occurrence, it can contain
  multiple values in that single field.  Is it totally insane to say that
  `msg['cc'] = 'address'` would append `address` to the existing value?  It
  probably is, but having to do that manually also kind of sucks.

  Some headers have other constraints (RFC 5322, $3.6).  For example
  Message-ID can technically appear zero times, but "SHOULD be present".  Part
  of me thinks it should be out of scope for email6 to enforce this, and I'm
  not sure where that would get enforced anyway, but I'm just wondering if
  you've thought about that.

* Datetimes: \o/.  It will be awesome when I can `msg['date'] = a_datetime`.
  While it does seem reasonable that a naive datetime uses -0000, it should
  also be very easy for folks to add a Date header that references the local
  timezone, since I suspect that will be a more common use case than UTC.  I
  don't know what the answer for that is though.

* As for header parsing, have you looked at the pyparsing module?  I don't
  write many parsers, and have no direct experience with pyparsing, but I keep
  hearing really good things about it.  OTOH, it's not in the stdlib, so it
  would present problems if email6 were to adopt it.  Still, I don't envy this
  part of the job, and I sympathize with the rabbit-hole effect of "just one
  more little thing..." ;)  Oh, and I'm just blown away impressed by the work
  you've done on the parser.

* Are there operations on Groups and Mailboxes?  E.g. in your example, I see
  that you added `dinsdale at python.org` to the To header by string
  concatenation.  What if for example, I had a number of addresses that I
  wanted to combine into a Reply-To header (which RFC 5322 says I can only
  have one of).  Would I be able to do something like the following:

  >>> msg['reply_to'].mailboxes.append('another at example.com')

  and have the printed representation of the message look correct?  Ah, maybe
  something like your last example in the What's Missing section covers this.

* Oooh!  Your example has an `== None` which should probably be `is None` :)

Really, *really* fantastic stuff.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110608/e8db1345/attachment.pgp>


More information about the Email-SIG mailing list