From rdmurray at bitdance.com Wed Jun 8 20:28:13 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jun 2011 14:28:13 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) Message-ID: <20110608182814.8BD2625012E@webabinitio.net> Things have been a bit disrupted in my life over the past month (a family tragedy). Fortunately for this group one of my ways of coping is to write code, so I did manage to do a fair bit on the email6 project, I just haven't been keeping up with publishing about it consistently. I did write one other blog post before today's. Here are the two links: http://www.bitdance.com/blog/2011/05/23_01_Email6_Headers_and_Header_Classes/ http://www.bitdance.com/blog/2011/06/08_01_Email6_RFC822_Parser/ The big thing is an RFC822 parser. I should probably have asked for advice here before plunging in to it, but it seemed reasonably straightforward when I started :). And it still seems simple in outline, just complex in details. Take a look and give me whatever feedback you've got. -- R. David Murray http://www.bitdance.com From vikasruhil06 at gmail.com Wed Jun 8 20:35:52 2011 From: vikasruhil06 at gmail.com (vikas ruhil) Date: Thu, 9 Jun 2011 00:05:52 +0530 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110608182814.8BD2625012E@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> Message-ID: hey suggest i am looking for a web based mail service for FOSS comunity like gmail can anybody suggest from where i am to start ? should i use sendmail,posfix mail server or lamson mail server ? help me plz On Wed, Jun 8, 2011 at 11:58 PM, R. David Murray wrote: > Things have been a bit disrupted in my life over the past month (a family > tragedy). Fortunately for this group one of my ways of coping is to write > code, so I did manage to do a fair bit on the email6 project, I just > haven't > been keeping up with publishing about it consistently. I did write one > other > blog post before today's. Here are the two links: > > > http://www.bitdance.com/blog/2011/05/23_01_Email6_Headers_and_Header_Classes/ > http://www.bitdance.com/blog/2011/06/08_01_Email6_RFC822_Parser/ > > The big thing is an RFC822 parser. I should probably have asked for advice > here before plunging in to it, but it seemed reasonably straightforward > when I > started :). And it still seems simple in outline, just complex in details. > > Take a look and give me whatever feedback you've got. > > -- > R. David Murray http://www.bitdance.com > _______________________________________________ > Email-SIG mailing list > Email-SIG at python.org > Your options: > http://mail.python.org/mailman/options/email-sig/vikasruhil06%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jun 8 22:48:50 2011 From: barry at python.org (Barry Warsaw) Date: Wed, 8 Jun 2011 16:48:50 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110608182814.8BD2625012E@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> Message-ID: <20110608164850.451f5344@neurotica.wooz.org> On Jun 08, 2011, at 02:28 PM, R. David Murray wrote: >Things have been a bit disrupted in my life over the past month (a family >tragedy). I'm very sorry to hear this David. My thoughts are with you. As always, thanks for your amazing work on email6. You are my hero. Comments: * Changing the __setitem__ API. I've always thought about this as a pure convenience, and that appending was the most convenient semantics. Other methods, e.g. replace_header() should be included to provide the range of semantics that people want. Then we'd just pick one and alias it to __setitem__. I'm mixed as to whether appending still is the most convenient alias, since in my own code I often `del msg[header]; msg[header] = foo`. But that also changes the header order so it's not a perfect replacement. * Unique headers: is this controlled or influenced by a policy? For example, duplicate Subjects might be disallowed by RFC 5322, but could conceivably be allowed (or at least not prohibited) by other email-like protocols. Also, while some fields like CC allow only occurrence, it can contain multiple values in that single field. Is it totally insane to say that `msg['cc'] = 'address'` would append `address` to the existing value? It probably is, but having to do that manually also kind of sucks. Some headers have other constraints (RFC 5322, $3.6). For example Message-ID can technically appear zero times, but "SHOULD be present". Part of me thinks it should be out of scope for email6 to enforce this, and I'm not sure where that would get enforced anyway, but I'm just wondering if you've thought about that. * Datetimes: \o/. It will be awesome when I can `msg['date'] = a_datetime`. While it does seem reasonable that a naive datetime uses -0000, it should also be very easy for folks to add a Date header that references the local timezone, since I suspect that will be a more common use case than UTC. I don't know what the answer for that is though. * As for header parsing, have you looked at the pyparsing module? I don't write many parsers, and have no direct experience with pyparsing, but I keep hearing really good things about it. OTOH, it's not in the stdlib, so it would present problems if email6 were to adopt it. Still, I don't envy this part of the job, and I sympathize with the rabbit-hole effect of "just one more little thing..." ;) Oh, and I'm just blown away impressed by the work you've done on the parser. * Are there operations on Groups and Mailboxes? E.g. in your example, I see that you added `dinsdale at python.org` to the To header by string concatenation. What if for example, I had a number of addresses that I wanted to combine into a Reply-To header (which RFC 5322 says I can only have one of). Would I be able to do something like the following: >>> msg['reply_to'].mailboxes.append('another at example.com') and have the printed representation of the message look correct? Ah, maybe something like your last example in the What's Missing section covers this. * Oooh! Your example has an `== None` which should probably be `is None` :) Really, *really* fantastic stuff. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 8 22:51:33 2011 From: barry at python.org (Barry Warsaw) Date: Wed, 8 Jun 2011 16:51:33 -0400 Subject: [Email-SIG] Modoboa (was Re: rfc822 parser (the elephant has landed)) In-Reply-To: References: <20110608182814.8BD2625012E@webabinitio.net> Message-ID: <20110608165133.4cabec71@neurotica.wooz.org> On Jun 09, 2011, at 12:05 AM, vikas ruhil wrote: >hey suggest i am looking for a web based mail service for FOSS comunity >like gmail can anybody suggest from where i am to start ? should i use >sendmail,posfix mail server or lamson mail server ? help me plz Modoboa perhaps? http://modoboa.org/ -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From rdmurray at bitdance.com Thu Jun 9 00:46:53 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jun 2011 18:46:53 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110608164850.451f5344@neurotica.wooz.org> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> Message-ID: <20110608224655.0C84E25012E@webabinitio.net> On Wed, 08 Jun 2011 16:48:50 -0400, Barry Warsaw wrote: > * Changing the __setitem__ API. I've always thought about this as a pure > convenience, and that appending was the most convenient semantics. Other > methods, e.g. replace_header() should be included to provide the range of > semantics that people want. Then we'd just pick one and alias it to > __setitem__. I'm mixed as to whether appending still is the most convenient > alias, since in my own code I often `del msg[header]; msg[header] = foo`. > But that also changes the header order so it's not a perfect replacement. Yeah, it would be really nice if setting (say) 'To' replaced it, but setting (say) 'Resent-To' appended. But that way lies chaos :) One of my ideas is to eventually decouple the header dictionary from the Message. That is, you access the headers through msg.headers instead of directly on msg. At that point we could get away with changing the semantics of __setitem__, and have msg.headers[X] be 'replace'. Having append be spelled 'msg.headers.append(X)' seems slightly more natural than having replace spelled msg.headers.replace(X), so that's what I'd be in favor of. > * Unique headers: is this controlled or influenced by a policy? For example, > duplicate Subjects might be disallowed by RFC 5322, but could conceivably be > allowed (or at least not prohibited) by other email-like protocols. Right now it is always applied, but IMO it needs to be a policy setting. So despite my thought that Messages don't have a policy, it turns out that they do :(. I haven't thought through how to handle that yet, though the obvious way is to set attributes on the Message when it is created. Perhaps what needs to be controlled on a Message is what Defects are considered to be errors that should be raised. An alternative would be to take the uniqueness check out of __setitem__ and do that check only at message generation time, if the policy says to do so. I'd prefer that the immediate raise be available as an option, myself, since it seems like it would catch programming errors sooner and thus make for a better user experience. > Also, while some fields like CC allow only occurrence, it can contain > multiple values in that single field. Is it totally insane to say that > `msg['cc'] = 'address'` would append `address` to the existing value? It > probably is, but having to do that manually also kind of sucks. Yeah I think that would be insane :). But += isn't and I want to support that, as you note later. > Some headers have other constraints (RFC 5322, $3.6). For example > Message-ID can technically appear zero times, but "SHOULD be present". Part > of me thinks it should be out of scope for email6 to enforce this, and I'm > not sure where that would get enforced anyway, but I'm just wondering if > you've thought about that. That one I think can only be enforced when the message is known to be "complete", which would be when it is transmitted. So the generator could have a policy setting that controls whether or not a lack of a Message-ID is a raisable error. > * Datetimes: \o/. It will be awesome when I can `msg['date'] = a_datetime`. > While it does seem reasonable that a naive datetime uses -0000, it should > also be very easy for folks to add a Date header that references the local > timezone, since I suspect that will be a more common use case than UTC. I > don't know what the answer for that is though. Well, Alexander has an answer (a function that returns an aware localtime in the datetime module) but hasn't gotten consensus on adding it. Perhaps I'll add such a function to email6, at least for the field trials. > * As for header parsing, have you looked at the pyparsing module? I don't > write many parsers, and have no direct experience with pyparsing, but I keep > hearing really good things about it. OTOH, it's not in the stdlib, so it > would present problems if email6 were to adopt it. Still, I don't envy this > part of the job, and I sympathize with the rabbit-hole effect of "just one > more little thing..." ;) Oh, and I'm just blown away impressed by the work > you've done on the parser. I thought about pyparsing (though I haven't tried it out myself), but I think its scope is much wider than email6 needs, and getting it in to the stdlib should be an independent project if doing so seems worthwhile. I don't think email6 should depend on anything not already in the stdlib. In any case, at this point I think the hard part of the parser is done, and everything else is incremental additions and tweaks. Something I didn't say in my blog post is that I'm thinking of marking rfc822_parser as a private module for the 3.3 release, but that a long term goal would be to expose it, if it proves to be worthwhile and useful apart from its internal use in email6. I think there are occasions when programs need to do non-email rfc822 parsing, where it could come in handy (perhaps with a few API tweaks to optionally suppress email-specific hacks). Alternatively, the parser might get replaced by something else that does the same job, especially if it proves to be a performance bottleneck. > * Are there operations on Groups and Mailboxes? E.g. in your example, I see > that you added `dinsdale at python.org` to the To header by string > concatenation. What if for example, I had a number of addresses that I > wanted to combine into a Reply-To header (which RFC 5322 says I can only > have one of). Would I be able to do something like the following: > > >>> msg['reply_to'].mailboxes.append('another at example.com') > > and have the printed representation of the message look correct? Ah, maybe > something like your last example in the What's Missing section covers this. Yes. Headers are immutable, so 'append' is not the appropriate operation for this. + or += is. What I'm thinking is that the current Mailbox and Group objects should be enhanced so that there is a nice API for creating them from various kinds of input data, and an explicit AddresList object added, and then they can be passed around, summed, and maybe even subtracted with each other and with AddressList valued header fields. > * Oooh! Your example has an `== None` which should probably be `is None` :) Heh. Oops :) At least I ran the doc tests this time before posting. > Really, *really* fantastic stuff. Thanks. -- R. David Murray http://www.bitdance.com From stephen at xemacs.org Thu Jun 9 09:45:08 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jun 2011 16:45:08 +0900 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110608224655.0C84E25012E@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> <20110608224655.0C84E25012E@webabinitio.net> Message-ID: <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > Yeah, it would be really nice if setting (say) 'To' replaced it, but > setting (say) 'Resent-To' appended. But that way lies chaos :) Especially since "Resent-To" (and other Resent-*, as well as trace headers) needs to be *pre*pended. :) > One of my ideas is to eventually decouple the header dictionary from the > Message. I don't understand why you want to do that; in many applications, you pass around a reference to the body but never need to access it until a final flattening operation. The headers are naturally structured as a list or ordered dictionary. Bodies OTOH are recursively structured, so they really can't be handled in the same way. > > * Unique headers: is this controlled or influenced by a policy? For example, > > duplicate Subjects might be disallowed by RFC 5322, but could conceivably be > > allowed (or at least not prohibited) by other email-like protocols. > > Right now it is always applied, but IMO it needs to be a policy > setting. Yes. The Postel Principle applies here. > > Also, while some fields like CC allow only occurrence, it can contain > > multiple values in that single field. Is it totally insane to say that > > `msg['cc'] = 'address'` would append `address` to the existing value? It > > probably is, but having to do that manually also kind of sucks. > > Yeah I think that would be insane :). +1 for insanity. > But += isn't and I want to support that, as you note later. +1 for += (and perhaps -=). > > Some headers have other constraints (RFC 5322, $3.6). For example > > Message-ID can technically appear zero times, but "SHOULD be present". Part > > of me thinks it should be out of scope for email6 to enforce this, and I'm > > not sure where that would get enforced anyway, but I'm just wondering if > > you've thought about that. > > That one I think can only be enforced when the message is known to be > "complete", which would be when it is transmitted. "Enforced", yes, it's out of scope, for several reasons. However, any given application may know at some early stage that headers are complete, and want to check policy at that point. So there should be a mechanism to explicitly check policy conformance, perhaps a .check_policy() method on Message objects. Then it becomes a question of whether the policy check should ever be called implicitly, or always left up to the application. From rdmurray at bitdance.com Fri Jun 10 15:58:26 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 10 Jun 2011 09:58:26 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> <20110608224655.0C84E25012E@webabinitio.net> <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110610135827.0AD0D2505A0@webabinitio.net> On Thu, 09 Jun 2011 16:45:08 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > Yeah, it would be really nice if setting (say) 'To' replaced it, but > > setting (say) 'Resent-To' appended. But that way lies chaos :) > > Especially since "Resent-To" (and other Resent-*, as well as trace > headers) needs to be *pre*pended. :) Ah, right. Which means we don't support that currently... > > One of my ideas is to eventually decouple the header dictionary from the > > Message. > > I don't understand why you want to do that; in many applications, you > pass around a reference to the body but never need to access it until > a final flattening operation. The headers are naturally structured as > a list or ordered dictionary. Bodies OTOH are recursively structured, > so they really can't be handled in the same way. Well, the main motivation was so that I could change the semantics of __setitem__. > > > * Unique headers: is this controlled or influenced by a policy? For example, > > > duplicate Subjects might be disallowed by RFC 5322, but could conceivably be > > > allowed (or at least not prohibited) by other email-like protocols. > > > > Right now it is always applied, but IMO it needs to be a policy > > setting. > > Yes. The Postel Principle applies here. Well, that's already in place. The parser treats duplicate unique headers as a defect by default. But there needs to be a way to construct invalid messages, too, I think. Heh. And I forgot, there actually is a way with the current code to create duplicate headers, you just have to call append instead of using __setitem__. So maybe it wouldn't be totally crazy to have unique headers __setitem__ be replace while non-unique headers __setitem__ does append. We could even go really crazy and have Resent headers __setitem__ do prepend :) The other way to control this "unique header" behavior would be to change the header registry. If you are building an application whose headers do not conform to the RFC, you would probably end up doing that anyway. If you combine the last two ideas, we could have a carefully defined API for controlling how __setitem__ works using attributes on the header classes. Totally crazy? Crazy-smart? > > > Also, while some fields like CC allow only occurrence, it can contain > > > multiple values in that single field. Is it totally insane to say that > > > `msg['cc'] = 'address'` would append `address` to the existing value? It > > > probably is, but having to do that manually also kind of sucks. > > > > Yeah I think that would be insane :). > > +1 for insanity. Are you saying = should append to the value? I think that would be bad/counterintuitive. > > But += isn't and I want to support that, as you note later. > > +1 for += (and perhaps -=). Agreed. > > > Some headers have other constraints (RFC 5322, $3.6). For example > > > Message-ID can technically appear zero times, but "SHOULD be present". Part > > > of me thinks it should be out of scope for email6 to enforce this, and I'm > > > not sure where that would get enforced anyway, but I'm just wondering if > > > you've thought about that. > > > > That one I think can only be enforced when the message is known to be > > "complete", which would be when it is transmitted. > > "Enforced", yes, it's out of scope, for several reasons. However, any > given application may know at some early stage that headers are > complete, and want to check policy at that point. So there should be > a mechanism to explicitly check policy conformance, perhaps a > .check_policy() method on Message objects. Then it becomes a question > of whether the policy check should ever be called implicitly, or > always left up to the application. How about a validate function that takes a message and a policy? That would be parallel to generator. In fact, it might share some code with generator. -- R. David Murray http://www.bitdance.com From merwok at netwok.org Fri Jun 10 19:00:32 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 10 Jun 2011 19:00:32 +0200 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110608182814.8BD2625012E@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> Message-ID: <4DF24DB0.4060809@netwok.org> Hi, I know close to zilch about email but thought I?d give two eurocents. The first cent is about subclassing builtins. I read in your article that your code uses subclasses of str and list; can?t that lead to problems caused by fast paths for built-in types in CPython code? (if I understand http://bugs.python.org/issue10977 correctly) The second cent is about naming. Does a Mailbox represent an email address? The confusion with mailbox.Mailbox would be a problem. Dare I say it? PEP 8 would advise rfc822parser for the name, or parser (but I don?t know how you plan to deprecate/replace the existing email.parser module). I?m sorry for your family stuff. Regards From rdmurray at bitdance.com Fri Jun 10 20:27:39 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 10 Jun 2011 14:27:39 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <4DF24DB0.4060809@netwok.org> References: <20110608182814.8BD2625012E@webabinitio.net> <4DF24DB0.4060809@netwok.org> Message-ID: <20110610182740.897052505A0@webabinitio.net> On Fri, 10 Jun 2011 19:00:32 +0200, wrote: > The first cent is about subclassing builtins. I read in your article > that your code uses subclasses of str and list; can???t that lead to > problems caused by fast paths for built-in types in CPython code? (if I > understand http://bugs.python.org/issue10977 correctly) The problems there arise from C code calling (or, rather, not calling) methods on the subclass. But in email headers act *just like* strings, but they have *extra* methods. So there should be no problem. Anything that doesn't know about the extra methods will treat the header just like a string, which is exactly what we want for backward compatibility reasons. The one place where this might bite us is in the proposed support for += and -=. I haven't tested that yet, and if it does work I'm not sure that there won't be obscure corners in which will turn out to be broken. > The second cent is about naming. Does a Mailbox represent an email > address? The confusion with mailbox.Mailbox would be a problem. Well, that is an issue. I'm not entirely happy about the name, but I haven't thought of a better one. The problem is that we have to deal both with a full 'mailbox' and the 'addr-spec' subpart, and I don't know of *any* other name (other than 'addr-spec') for the addr-spec part. (Well, 'address', but you can see the problem with using that for both meanings...) Perhaps it would be better to use that (or rather addr_spec), and use 'address' for the address-with-display-name ('mailbox'). I'm open to suggestions for better naming in the API. > Dare I say it? PEP 8 would advise rfc822parser for the name, or parser > (but I don???t know how you plan to deprecate/replace the existing > email.parser module). Good point. rfc822parser is completely distinct from 'parser', which probably won't get deprecated. On the other hand, once I add RFC2047 support to it, perhaps I should rename it rfcparser (or, at least at first, _rfcparser). Or perhaps _headerparser, though it doesn't contain *all* of the header parsing machinery. > I???m sorry for your family stuff. Thanks. -- R. David Murray http://www.bitdance.com From barry at python.org Fri Jun 10 22:42:49 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Jun 2011 16:42:49 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110608224655.0C84E25012E@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> <20110608224655.0C84E25012E@webabinitio.net> Message-ID: <20110610164249.1f3db8d5@neurotica.wooz.org> On Jun 08, 2011, at 06:46 PM, R. David Murray wrote: >One of my ideas is to eventually decouple the header dictionary from the >Message. That is, you access the headers through msg.headers instead >of directly on msg. At that point we could get away with changing >the semantics of __setitem__, and have msg.headers[X] be 'replace'. >Having append be spelled 'msg.headers.append(X)' seems slightly more >natural than having replace spelled msg.headers.replace(X), so that's >what I'd be in favor of. I agree that it probably does make sense to eventually relegate the headers to msg.headers. But I think you'll want both .append() and .replace() methods for explicitness, with one of them being mapped to __setitem__() for convenience. Heck, as is pointed out elsewhere, __setitem__() will probably be mapped to .magical_rfc_compliant_manipulation_of_header(X, policy) anyway. >An alternative would be to take the uniqueness check out of __setitem__ >and do that check only at message generation time, if the policy says to >do so. I'd prefer that the immediate raise be available as an option, >myself, since it seems like it would catch programming errors sooner >and thus make for a better user experience. Definitely. >> Also, while some fields like CC allow only occurrence, it can contain >> multiple values in that single field. Is it totally insane to say that >> `msg['cc'] = 'address'` would append `address` to the existing value? It >> probably is, but having to do that manually also kind of sucks. > >Yeah I think that would be insane :). But += isn't and I want to support >that, as you note later. +=1! >> Some headers have other constraints (RFC 5322, $3.6). For example >> Message-ID can technically appear zero times, but "SHOULD be present". Part >> of me thinks it should be out of scope for email6 to enforce this, and I'm >> not sure where that would get enforced anyway, but I'm just wondering if >> you've thought about that. > >That one I think can only be enforced when the message is known to be >"complete", which would be when it is transmitted. So the generator >could have a policy setting that controls whether or not a lack of >a Message-ID is a raisable error. It might also make sense for Messages to have a .validate(policy) method. The application using email6 should essentially know when it's done parsing or manipulating the message, so it could call .validate() at that point. >> * Datetimes: \o/. It will be awesome when I can `msg['date'] = a_datetime`. >> While it does seem reasonable that a naive datetime uses -0000, it should >> also be very easy for folks to add a Date header that references the local >> timezone, since I suspect that will be a more common use case than UTC. I >> don't know what the answer for that is though. > >Well, Alexander has an answer (a function that returns an aware localtime >in the datetime module) but hasn't gotten consensus on adding it. >Perhaps I'll add such a function to email6, at least for the field trials. Nice. >> * As for header parsing, have you looked at the pyparsing module? I don't >> write many parsers, and have no direct experience with pyparsing, but I keep >> hearing really good things about it. OTOH, it's not in the stdlib, so it >> would present problems if email6 were to adopt it. Still, I don't envy this >> part of the job, and I sympathize with the rabbit-hole effect of "just one >> more little thing..." ;) Oh, and I'm just blown away impressed by the work >> you've done on the parser. > >I thought about pyparsing (though I haven't tried it out myself), but >I think its scope is much wider than email6 needs, and getting it in to >the stdlib should be an independent project if doing so seems worthwhile. >I don't think email6 should depend on anything not already in the stdlib. Agreed. >In any case, at this point I think the hard part of the parser is done, >and everything else is incremental additions and tweaks. > >Something I didn't say in my blog post is that I'm thinking of marking >rfc822_parser as a private module for the 3.3 release, but that a long >term goal would be to expose it, if it proves to be worthwhile and useful >apart from its internal use in email6. I think there are occasions when >programs need to do non-email rfc822 parsing, where it could come in handy >(perhaps with a few API tweaks to optionally suppress email-specific hacks). Again, agreed. There are *lots* of file formats that follow rfc822 style layouts. One that I'm particularly interested in these days is Debian control files. It's essentially rfc822 headers with no bodies, with sections separated by a blank line. It would be kind of neat if the stdlib could help me parse those. >Yes. Headers are immutable, so 'append' is not the appropriate operation >for this. + or += is. What I'm thinking is that the current Mailbox >and Group objects should be enhanced so that there is a nice API for >creating them from various kinds of input data, and an explicit AddresList >object added, and then they can be passed around, summed, and maybe even >subtracted with each other and with AddressList valued header fields. Sounds good to me. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Fri Jun 10 22:47:34 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Jun 2011 16:47:34 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> <20110608224655.0C84E25012E@webabinitio.net> <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110610164734.39f6819c@neurotica.wooz.org> On Jun 09, 2011, at 04:45 PM, Stephen J. Turnbull wrote: >R. David Murray writes: > > > Yeah, it would be really nice if setting (say) 'To' replaced it, but > > setting (say) 'Resent-To' appended. But that way lies chaos :) > >Especially since "Resent-To" (and other Resent-*, as well as trace >headers) needs to be *pre*pended. :) .insert(i, header) probably, where `i` could either (maybe) be an integer or the name of the first header to insert the new header before. >"Enforced", yes, it's out of scope, for several reasons. However, any >given application may know at some early stage that headers are >complete, and want to check policy at that point. So there should be >a mechanism to explicitly check policy conformance, perhaps a >.check_policy() method on Message objects. Then it becomes a question >of whether the policy check should ever be called implicitly, or >always left up to the application. Smart minds think alike. :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Fri Jun 10 22:49:05 2011 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Jun 2011 16:49:05 -0400 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110610135827.0AD0D2505A0@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> <20110608224655.0C84E25012E@webabinitio.net> <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> <20110610135827.0AD0D2505A0@webabinitio.net> Message-ID: <20110610164905.7979be9b@neurotica.wooz.org> On Jun 10, 2011, at 09:58 AM, R. David Murray wrote: >If you combine the last two ideas, we could have a carefully defined >API for controlling how __setitem__ works using attributes on the >header classes. > >Totally crazy? Crazy-smart? Could be! >How about a validate function that takes a message and a policy? >That would be parallel to generator. In fact, it might share some code >with generator. Smart minds think alike. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Sun Jun 12 16:07:34 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 12 Jun 2011 23:07:34 +0900 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110610135827.0AD0D2505A0@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> <20110608164850.451f5344@neurotica.wooz.org> <20110608224655.0C84E25012E@webabinitio.net> <871uz3o3cr.fsf@uwakimon.sk.tsukuba.ac.jp> <20110610135827.0AD0D2505A0@webabinitio.net> Message-ID: <87ipsbm9cp.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > Ah, right. Which means we don't support that currently... No biggee. As Barry says, .insert(resent_header,0) would do the trick. However, resent-* headers might be prepended as block. Do you support headers[0:0] = resent_header_list now? > > > One of my ideas is to eventually decouple the header dictionary from the > > > Message. > > > > I don't understand why you want to do that; > > Well, the main motivation was so that I could change the semantics of > __setitem__. Ah, OK. I've always thought the email representation of messages as a mapping of headers with a couple of special attributes was a little quirky but nice, that's all. It's not something that's hard to give up, especially since hs = msg.headers is always available. > So maybe it wouldn't be totally crazy to have unique headers __setitem__ > be replace while non-unique headers __setitem__ does append. We could > even go really crazy and have Resent headers __setitem__ do prepend :) But this kind of thing would probably have to be optional, since not every protocol that uses RFC 822-style headers is going to obey the modern rules that RFC 5322 requires. > The other way to control this "unique header" behavior would be to > change the header registry. If you are building an application whose > headers do not conform to the RFC, you would probably end up doing that > anyway. > > If you combine the last two ideas, we could have a carefully defined > API for controlling how __setitem__ works using attributes on the > header classes. > > Totally crazy? Crazy-smart? Totally crazy in the sense of +1 for more craziness. :-) > > > Yeah I think that would be insane :). > > > > +1 for insanity. > > Are you saying = should append to the value? I think that would be > bad/counterintuitive. No, just that insanity is a good thing as long as we don't implement more than the very best 10% of it. :-) > How about a validate function that takes a message and a policy? I would be comfortable with that API, for sure. Maybe there should be a way to set a default policy in the header registry. Perhaps each header in the registry could have its own ignore, warn, raise (, fix?) option, or even more flexibility. For example, you might want a policy so that email will accept and pass through multiple From fields, but never generate that (eg, a mailing list). Alternatively, you might want an exception raised if an incoming message has multiple >From fields (a local submission agent). From merwok at netwok.org Tue Jun 14 17:06:19 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Tue, 14 Jun 2011 17:06:19 +0200 Subject: [Email-SIG] rfc822 parser (the elephant has landed) In-Reply-To: <20110610182740.897052505A0@webabinitio.net> References: <20110608182814.8BD2625012E@webabinitio.net> <4DF24DB0.4060809@netwok.org> <20110610182740.897052505A0@webabinitio.net> Message-ID: <4DF778EB.6020807@netwok.org> Le 10/06/2011 20:27, R. David Murray a ?crit : > On Fri, 10 Jun 2011 19:00:32 +0200, wrote: > The problems there arise from C code calling (or, rather, not calling) > methods on the subclass. But in email headers act *just like* strings, > but they have *extra* methods. So there should be no problem. Anything > that doesn't know about the extra methods will treat the header just > like a string, which is exactly what we want for backward compatibility > reasons. Good. > The one place where this might bite us is in the proposed support for += > and -=. I haven't tested that yet, and if it does work I'm not sure > that there won't be obscure corners in which will turn out to be broken. I don?t know either. >> The second cent is about naming. Does a Mailbox represent an email >> address? The confusion with mailbox.Mailbox would be a problem. > Well, that is an issue. I'm not entirely happy about the name, but I > haven't thought of a better one. The problem is that we have to deal > both with a full 'mailbox' and the 'addr-spec' subpart, and I don't know > of *any* other name (other than 'addr-spec') for the addr-spec part. > (Well, 'address', but you can see the problem with using that for both > meanings...) Perhaps it would be better to use that (or rather > addr_spec), and use 'address' for the address-with-display-name > ('mailbox'). Yep, +1 for using addr_spec for some format defined in the RFCs, and address for the higher-level full address more familiar to human. > Good point. rfc822parser is completely distinct from 'parser', which > probably won't get deprecated. On the other hand, once I add RFC2047 > support to it, perhaps I should rename it rfcparser (or, at least at > first, _rfcparser). Or perhaps _headerparser, though it doesn't > contain *all* of the header parsing machinery. After reading your blog post and this email, I still can?t say whether this parser module deals with headers only or with full messages. If it?s the former, definite +1 to _headerparser; if it?s the latter, then _rfcparser or something else would be okay. Regards From rdmurray at bitdance.com Sat Jun 18 21:26:00 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 18 Jun 2011 15:26:00 -0400 Subject: [Email-SIG] smtplib.send_message Message-ID: <20110618192601.6B144250D3A@webabinitio.net> For various reasons the email6 work is on temporary hold. But, I've been working on some outstanding email5.1 bugs, and I've got a dilemma. I introduced smtplib send_message as a convenient way of sending a Message object. I did not, however, fully consider the implications of having done this the way I did it. A bug has been reported that it doesn't follow RFC2822 rules when auto-detecting the sender and sendee addresses. Specifically it ignores Sender and any Resent headers. The issue is here: http://bugs.python.org/issue12147 At first I thought, sure, let's go ahead and fix the logic. But as I was fixing up the docs in that patch, it occurred to me that there is a problematic case: what if there is more than one set of Resent- headers? Detecting just the most recent set of headers is not, as far as I can tell, algorithmically possible, and indeed the RFC prohibits using them for automated processing. Heuristically we could be right probably 99% of the time. So, opinions: should I implement the heuristics, or should I refuse to guess and bail if from_addr and/or to_addrs is None and there are any Resent headers in the message? (Third alternative: continue to auto-detect it if there is only one set, as the current patch does.) In hindsight I should probably have not supported defaulting to picking up the values from the Message object, but absent the proposed email6 extended headers it really is a very handy convenience. -- R. David Murray http://www.bitdance.com From senthil at uthcode.com Sat Jun 18 22:26:50 2011 From: senthil at uthcode.com (Senthil Kumaran) Date: Sat, 18 Jun 2011 13:26:50 -0700 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110618192601.6B144250D3A@webabinitio.net> References: <20110618192601.6B144250D3A@webabinitio.net> Message-ID: <20110618202650.GA2408@mathmagic> Hi David, On Sat, Jun 18, 2011 at 03:26:00PM -0400, R. David Murray wrote: > So, opinions: should I implement the heuristics, or should I refuse > to guess and bail if from_addr and/or to_addrs is None and there are > any Resent headers in the message? (Third alternative: continue to > auto-detect it if there is only one set, as the current patch does.) It would be difficult to conclude on this, without knowing what would be a reasonable user expectation. For e.g, if some existing software is not following RFC to the dot, but provides some convenience which users have gotten used to, then providing that convenience seems no harm to me. I hope that landing upon the huristics would be a rare occasion and not a common case. -- Senthil From rdmurray at bitdance.com Sun Jun 19 00:26:16 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 18 Jun 2011 18:26:16 -0400 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110618202650.GA2408@mathmagic> References: <20110618192601.6B144250D3A@webabinitio.net> <20110618202650.GA2408@mathmagic> Message-ID: <20110618222617.9B5BC250D3A@webabinitio.net> On Sat, 18 Jun 2011 13:26:50 -0700, Senthil Kumaran wrote: > On Sat, Jun 18, 2011 at 03:26:00PM -0400, R. David Murray wrote: > > So, opinions: should I implement the heuristics, or should I refuse > > to guess and bail if from_addr and/or to_addrs is None and there are > > any Resent headers in the message? (Third alternative: continue to > > auto-detect it if there is only one set, as the current patch does.) > > It would be difficult to conclude on this, without knowing what would > be a reasonable user expectation. For e.g, if some existing software > is not following RFC to the dot, but provides some convenience which > users have gotten used to, then providing that convenience seems no > harm to me. I hope that landing upon the huristics would be a rare > occasion and not a common case. Well, the most typical scenario is an MUA application that has processed a message, and the disposition it wants to make of the message (either at user command if it is an interactive ap or via rules if not) is to re-inject the message, sending it to new recipients. This is typically called "bouncing" the message. In that scenario, the *first* bounce involves one set of Resent- headers, and that is unambiguous. But as soon as you consider bouncing a message that has already been bounced, you have multiple sets of headers. Now, the application knows who it wants to send the message to and who is sending it, so it can correctly specify from_addr and to_addrs. The convenience being provided by send_message is not having to separately track compute the this-hop sender and recipients, but being able to compute them only once when creating the Resent- headers, and then have send_message extract them from the Message via the Resent- headers. But this use case is not supported by the RFC. So, how often would the heuristics be used? Any time an already-resent message was again resent. This is not a common occurrence, but neither is it a truly marginal one. -- R. David Murray http://www.bitdance.com From merwok at netwok.org Sun Jun 19 16:44:01 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Sun, 19 Jun 2011 16:44:01 +0200 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110618192601.6B144250D3A@webabinitio.net> References: <20110618192601.6B144250D3A@webabinitio.net> Message-ID: <4DFE0B31.9020004@netwok.org> > In hindsight I should probably have not supported defaulting to picking > up the values from the Message object, but absent the proposed email6 > extended headers it really is a very handy convenience. Anecdotal user expectation: While I know that email and smtplib are different modules, I really don?t see why I have to give again some headers to the STMP object when they?re already here in the message object. So +1 for the convenience. Regards From phd at phdru.name Sun Jun 19 17:04:45 2011 From: phd at phdru.name (Oleg Broytman) Date: Sun, 19 Jun 2011 19:04:45 +0400 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <4DFE0B31.9020004@netwok.org> References: <20110618192601.6B144250D3A@webabinitio.net> <4DFE0B31.9020004@netwok.org> Message-ID: <20110619150445.GA12886@iskra.aviel.ru> On Sun, Jun 19, 2011 at 04:44:01PM +0200, ??ric Araujo wrote: > I really don???t see why I have to give again some > headers to the STMP object when they???re already here in the message > object. Bcc, e.g., is the header that isn't in the message or must be removed from the message before hitting the wire. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Sun Jun 19 17:13:51 2011 From: phd at phdru.name (Oleg Broytman) Date: Sun, 19 Jun 2011 19:13:51 +0400 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110618192601.6B144250D3A@webabinitio.net> References: <20110618192601.6B144250D3A@webabinitio.net> Message-ID: <20110619151351.GB12886@iskra.aviel.ru> On Sat, Jun 18, 2011 at 03:26:00PM -0400, R. David Murray wrote: > So, opinions: should I implement the heuristics, or should I refuse > to guess and bail if from_addr and/or to_addrs is None and there are > any Resent headers in the message? (Third alternative: continue to > auto-detect it if there is only one set, as the current patch does.) I vote for the third alternative: do your best if you can deduce the values but refuse to guess. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From stephen at xemacs.org Sun Jun 19 19:37:10 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jun 2011 02:37:10 +0900 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110618192601.6B144250D3A@webabinitio.net> References: <20110618192601.6B144250D3A@webabinitio.net> Message-ID: <87mxhdk9ix.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > So, opinions: should I implement the heuristics, or should I refuse > to guess and bail if from_addr and/or to_addrs is None and there are > any Resent headers in the message? (Third alternative: continue to > auto-detect it if there is only one set, as the current patch > does.) The heuristics should be implemented as a separate function or method, and a way to specify the function to call. My taste would be to default the control variable/attribute to None, but if the use case you're thinking about is sufficiently common or you want backward compatibility, you could default it to your heuristics. From senthil at uthcode.com Mon Jun 20 01:32:00 2011 From: senthil at uthcode.com (Senthil Kumaran) Date: Sun, 19 Jun 2011 16:32:00 -0700 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110618222617.9B5BC250D3A@webabinitio.net> References: <20110618192601.6B144250D3A@webabinitio.net> <20110618202650.GA2408@mathmagic> <20110618222617.9B5BC250D3A@webabinitio.net> Message-ID: <20110619233200.GB3964@mathmagic> On Sat, Jun 18, 2011 at 06:26:16PM -0400, R. David Murray wrote: > from_addr and to_addrs. The convenience being provided by send_message is > not having to separately track compute the this-hop sender and recipients, > but being able to compute them only once when creating the Resent- > headers, and then have send_message extract them from the Message via > the Resent- headers. But this use case is not supported by the RFC. I got the point. Can the heuristics be turned-off and defaulted to reject if someone using the smtplib api wants it that way? If yes, then approach seems fine to me. +1. -- Senthil From rdmurray at bitdance.com Mon Jun 20 04:47:55 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 19 Jun 2011 22:47:55 -0400 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <87mxhdk9ix.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20110618192601.6B144250D3A@webabinitio.net> <87mxhdk9ix.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110620024755.DC167250D3A@webabinitio.net> On Mon, 20 Jun 2011 02:37:10 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > So, opinions: should I implement the heuristics, or should I refuse > > to guess and bail if from_addr and/or to_addrs is None and there are > > any Resent headers in the message? (Third alternative: continue to > > auto-detect it if there is only one set, as the current patch > > does.) > > The heuristics should be implemented as a separate function or method, > and a way to specify the function to call. My taste would be to > default the control variable/attribute to None, but if the use case > you're thinking about is sufficiently common or you want backward > compatibility, you could default it to your heuristics. Hmm. That would be an API addition. Current the code is just buggy, in that it completely ignores Resent- headers, which is just wrong. So having a new API switch default to None would be fine, and my preference. Given what everyone has said, it sounds like for 3.2 I should fix the bug by implementing the stuff that is not a guess (Sender, and using Resent- if there is only *one* set of Resent- headers), and generate a ValueError if there is more than one copy of any of the Resent- headers. Then for 3.3 we could have a "guess please" knob...but I think I'm not going to take the time to implement it until we get an actual request for it. -- R. David Murray http://www.bitdance.com From stephen at xemacs.org Mon Jun 20 10:57:14 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jun 2011 17:57:14 +0900 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110620024755.DC167250D3A@webabinitio.net> References: <20110618192601.6B144250D3A@webabinitio.net> <87mxhdk9ix.fsf@uwakimon.sk.tsukuba.ac.jp> <20110620024755.DC167250D3A@webabinitio.net> Message-ID: <87oc1sanit.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > On Mon, 20 Jun 2011 02:37:10 +0900, "Stephen J. Turnbull" wrote: > > The heuristics should be implemented as a separate function or method, > > and a way to specify the function to call. > > Hmm. That would be an API addition. Conceded. I'm happy to delegate that decision to you, as well as the variable's default if implemented. From barry at python.org Mon Jun 20 16:21:15 2011 From: barry at python.org (Barry Warsaw) Date: Mon, 20 Jun 2011 10:21:15 -0400 Subject: [Email-SIG] smtplib.send_message In-Reply-To: <20110619233200.GB3964@mathmagic> References: <20110618192601.6B144250D3A@webabinitio.net> <20110618202650.GA2408@mathmagic> <20110618222617.9B5BC250D3A@webabinitio.net> <20110619233200.GB3964@mathmagic> Message-ID: <20110620102115.62b687b1@neurotica.wooz.org> My main concern is that whatever you decide to do, it is well documented, well tested, and completely predictable. We don't want to make it easy for applications to do the wrong thing and accidentally send a message to the wrong recipients. Reducing functionality to ensure this is fine with me. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: