[Email-SIG] rfc822 parser (the elephant has landed)
Barry Warsaw
barry at python.org
Wed Jun 8 22:48:50 CEST 2011
On Jun 08, 2011, at 02:28 PM, R. David Murray wrote:
>Things have been a bit disrupted in my life over the past month (a family
>tragedy).
I'm very sorry to hear this David. My thoughts are with you.
As always, thanks for your amazing work on email6. You are my hero.
Comments:
* Changing the __setitem__ API. I've always thought about this as a pure
convenience, and that appending was the most convenient semantics. Other
methods, e.g. replace_header() should be included to provide the range of
semantics that people want. Then we'd just pick one and alias it to
__setitem__. I'm mixed as to whether appending still is the most convenient
alias, since in my own code I often `del msg[header]; msg[header] = foo`.
But that also changes the header order so it's not a perfect replacement.
* Unique headers: is this controlled or influenced by a policy? For example,
duplicate Subjects might be disallowed by RFC 5322, but could conceivably be
allowed (or at least not prohibited) by other email-like protocols.
Also, while some fields like CC allow only occurrence, it can contain
multiple values in that single field. Is it totally insane to say that
`msg['cc'] = 'address'` would append `address` to the existing value? It
probably is, but having to do that manually also kind of sucks.
Some headers have other constraints (RFC 5322, $3.6). For example
Message-ID can technically appear zero times, but "SHOULD be present". Part
of me thinks it should be out of scope for email6 to enforce this, and I'm
not sure where that would get enforced anyway, but I'm just wondering if
you've thought about that.
* Datetimes: \o/. It will be awesome when I can `msg['date'] = a_datetime`.
While it does seem reasonable that a naive datetime uses -0000, it should
also be very easy for folks to add a Date header that references the local
timezone, since I suspect that will be a more common use case than UTC. I
don't know what the answer for that is though.
* As for header parsing, have you looked at the pyparsing module? I don't
write many parsers, and have no direct experience with pyparsing, but I keep
hearing really good things about it. OTOH, it's not in the stdlib, so it
would present problems if email6 were to adopt it. Still, I don't envy this
part of the job, and I sympathize with the rabbit-hole effect of "just one
more little thing..." ;) Oh, and I'm just blown away impressed by the work
you've done on the parser.
* Are there operations on Groups and Mailboxes? E.g. in your example, I see
that you added `dinsdale at python.org` to the To header by string
concatenation. What if for example, I had a number of addresses that I
wanted to combine into a Reply-To header (which RFC 5322 says I can only
have one of). Would I be able to do something like the following:
>>> msg['reply_to'].mailboxes.append('another at example.com')
and have the printed representation of the message look correct? Ah, maybe
something like your last example in the What's Missing section covers this.
* Oooh! Your example has an `== None` which should probably be `is None` :)
Really, *really* fantastic stuff.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110608/e8db1345/attachment.pgp>
More information about the Email-SIG
mailing list