From xavier.delannoy at cloudmark.com Wed Jul 6 11:53:43 2011 From: xavier.delannoy at cloudmark.com (xavier delannoy) Date: Wed, 6 Jul 2011 11:53:43 +0200 Subject: [Email-SIG] question about the best way to check if an email is valid (RFC compliant) Message-ID: <4E1430A7.3060701@cloudmark.com> Hi, I use the python email library for an Automation Test Framework. The framework test a Mail Transfert Agent. I notice that the "email" library silently fix a lot of MIME errors. I can understand this behaviour, but I need to validate that the email sent by the MTA are correct. I wonder if there's a way to use the python MIME Parser more "aggressively" (without modifying the email) and raise and exception as soon as an error is detected. Here's my needs and an extract of my python: # Parse the received email try: received_email = email.message_from_string(args) except email.errors.MessageError, msg: self.ok((False, msg), msg) I test with an invalid email (for example, with a missing closed boundary), and no exception is raised. The email lib fixes the issue. Is there a way to tell the email lib to not modify the email, and raise an exception ? Regards, -- Xavier QA Engineer at Cloudmark Labs. From rdmurray at bitdance.com Wed Jul 6 14:08:55 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 06 Jul 2011 08:08:55 -0400 Subject: [Email-SIG] question about the best way to check if an email is valid (RFC compliant) In-Reply-To: <4E1430A7.3060701@cloudmark.com> References: <4E1430A7.3060701@cloudmark.com> Message-ID: <20110706120856.349E02506C1@webabinitio.net> On Wed, 06 Jul 2011 11:53:43 +0200, xavier delannoy wrote: > I use the python email library for an Automation Test Framework. The > framework test a Mail Transfert Agent. I notice that the "email" library > silently fix a lot of MIME errors. I can understand this behaviour, but > I need to validate that the email sent by the MTA are correct. > I wonder if there's a way to use the python MIME Parser more > "aggressively" (without modifying the email) and raise and exception as > soon as an error is detected. > > Here's my needs and an extract of my python: Not at the moment. You can check the defects attribute afterward to see if there were any detected errors, though. In email6 (planned for python 3.3) we will be providing a facility for doing the raise immediately. I don't know what you mean by leave the message unmodified, though, since the input string is already unmodified, and you won't get a message object when an error is raised. --David From xavier.delannoy at cloudmark.com Wed Jul 6 15:58:22 2011 From: xavier.delannoy at cloudmark.com (xavier delannoy) Date: Wed, 6 Jul 2011 15:58:22 +0200 Subject: [Email-SIG] question about the best way to check if an email is valid (RFC compliant) In-Reply-To: <20110706120856.349E02506C1@webabinitio.net> References: <4E1430A7.3060701@cloudmark.com> <20110706120856.349E02506C1@webabinitio.net> Message-ID: <4E1469FE.8000906@cloudmark.com> On 07/06/2011 02:08 PM, R. David Murray wrote: > On Wed, 06 Jul 2011 11:53:43 +0200, xavier delannoy wrote: >> I use the python email library for an Automation Test Framework. The >> framework test a Mail Transfert Agent. I notice that the "email" library >> silently fix a lot of MIME errors. I can understand this behaviour, but >> I need to validate that the email sent by the MTA are correct. >> I wonder if there's a way to use the python MIME Parser more >> "aggressively" (without modifying the email) and raise and exception as >> soon as an error is detected. >> >> Here's my needs and an extract of my python: > > Not at the moment. You can check the defects attribute afterward to > see if there were any detected errors, though. > > In email6 (planned for python 3.3) we will be providing a facility for > doing the raise immediately. I don't know what you mean by leave > the message unmodified, though, since the input string is already > unmodified, and you won't get a message object when an error is raised. > If an error is raised, and if I won't get a message object, then I'm fine. But with Python 2.7.1, I get a message object and the attribute defects is empty. In the attachment you will find: - orig.eml : an email with an error. The boundary "000101020201080900040301" isn't closed - after_parsing.eml: same email after calling email.message_from_file() The boundary is now closed. And the defects attribute is empty - test.py: python script to reproduce. -- Xavier > --David -------------- next part -------------- A non-text attachment was scrubbed... Name: sample.tgz Type: application/x-compressed-tar Size: 1853 bytes Desc: not available URL: From rdmurray at bitdance.com Wed Jul 6 20:41:40 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 06 Jul 2011 14:41:40 -0400 Subject: [Email-SIG] question about the best way to check if an email is valid (RFC compliant) In-Reply-To: <4E145693.4030306@cloudmark.com> References: <4E1430A7.3060701@cloudmark.com> <20110706120856.349E02506C1@webabinitio.net> <4E145693.4030306@cloudmark.com> Message-ID: <20110706184141.389112505A3@webabinitio.net> On Wed, 06 Jul 2011 14:35:31 +0200, xavier delannoy wrote: > If an error is raised, and if I won't get a message object, then I'm > fine. But with Python 2.7.1, I get a message object and the attribute > defects is empty. Please file a bug report about this at bugs.python.org and add me (tracker id r.david.murray) to the nosy list. It may not be fixable as a bug since the current email package makes no promises about detecting all errors. However, since as you point out the resulting message structure is changed, it probably is classifiable as a bug. I'll take a look at the issue when I get a chance, or you could propose a patch if you are motivated to do so. -- R. David Murray http://www.bitdance.com From rdmurray at bitdance.com Wed Jul 13 18:05:36 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 13 Jul 2011 12:05:36 -0400 Subject: [Email-SIG] new blog post Message-ID: <20110713160537.458D0B14005@webabinitio.net> I just posted a summary of my past month of work (which has been at a considerably slower pace than earlier): http://www.bitdance.com/blog/2011/07/13_01_email6_summer_vacation/ As I report there, it looks like I have to take a break this summer to do other stuff, but will pick it up again in the fall. I think that will still give us enough test time before 3.3 beta to get this in to 3.3. I should still be able to coble together bits and pieces of time, so I'm thinking about posting at least one specific task to the python-mentor's list (additional tests and the resulting fixes for the parser) to see if anyone wants to help out. Or perhaps someone here does :) I'll also try to find some time to work on the docs. The big summer project hasn't actually started yet, so I may make some non-trivial progress before it does. -- R. David Murray http://www.bitdance.com From rdmurray at bitdance.com Wed Jul 13 18:15:41 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 13 Jul 2011 12:15:41 -0400 Subject: [Email-SIG] API question with respect to address objects Message-ID: <20110713161542.2AD86B14005@webabinitio.net> So we have these address objects. Currently their string value is the "decoded" value, which means the content from the source is preserved except that (when I get it working!) encoded words and IDNA are decoded. As reported in the blog post, I've currently added a 'reformatted' attribute to give access to the value formatted according to RFC rules (but currently it isn't handling re-encoding). So, the question is, what do we want the API to ultimately be? I'm thinking that the string value should be the "idealized" value, which would mean we decode it fully and make it RFC conformant where that makes sense (minimal quoting, removal of spaces around dots in local parts, etc). We'll also want a way to get the wire format version of the address out (properly encoded to ASCII). And of course the application may want access to the 'source' value that was parsed to create the Address object. I'm thinking that version should probably be a strict substring of the source attribute of the header the address belongs to. Thoughts? -- R. David Murray http://www.bitdance.com From rdmurray at bitdance.com Tue Jul 19 23:21:39 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 19 Jul 2011 17:21:39 -0400 Subject: [Email-SIG] email-6.0.0.a1 Message-ID: <20110719212139.D5D732500D5@webabinitio.net> OK, so I've released the first iteration of the email6 package on pypi as email-6.0.0a1. After install you import it as email6. This will allow anyone curious and/or motivated to test it out under Python 3.2. I'm especially interested in anyone with a working program that uses email in 3.2: it should be completely backward compatible, and if it isn't I want to know ASAP.[*] I've also opened issue 12586 for review of the delta between default and the code that is in the release. I'd like to check the code in to default and continue to work on it from there. As I said in the issue comments: "When we originally planned out email6 we thought we'd be making a "compatibility break" with backward compatibility shims. As things have turned out the work is more a matter of incremental improvement of the API while maintaining the old API, and thus it seems reasonable to me to work on it directly in default rather than continue to work on it in a separate feature branch." Assuming, that is, that the general approach represented by *this* delta is accepted. What this delta adds to email is a conversion to handling all headers as full blown objects (as opposed to strings, tuples of strings, or Header objects, depending on context). The object type is a subclass of str, so the headers act like strings if you don't use their additional API. The basic additional API is that a 'source' attribute contains the text the generator read from the input source, and a 'value' attribute that contains the value with all the Content-Transfer-Encoding stuff undone so that you have a real unicode string. By changing a policy setting, you can have that value as the string value of the header. You can also assign a string with non-ASCII characters to a header, and the right thing will happen. (Well, eventually it will happen...right now it only works correctly for unstructured headers). Further, Date headers have a datetime attribute (and accept being set to a datetime), and address headers have attributes for accessing the individual addresses in the header. Other structured headers will eventually grow additional attributes as well. The general approach has been discussed with and approved by the email-sig, but all comments are welcome. I know there's room for bikeshedding on some aspects of the API; in some cases I've dome some "placeholder" stuff pending a more complete solution to certain design goals. I have a big project in the offing over the next couple months. QNX is still fully behind the funding for email6 development, but I probably won't be able to complete it until the fall. So I'd like to get this chunk (the biggest chunk of new code, considering the size of the parser) reviewed and checked in if possible. I'll keep working on the bits of functionality that aren't quite complete and the bugs that I know are there until my big project kicks off, but I wanted to release/post now so that there might be a chance of some review happening while I still have time to respond quickly to the feedback. -- R. David Murray http://www.bitdance.com [*] I believe that if you try to use an email6 Message object with the 3.2 mailbox module you will run in to some trouble, but I think it ought to be possible to make it work with the right magic :) PS: I don't have much experience writing parsers, so I'm expecting some critical comments about my parser design. It had to be a custom parser since otherwise I'd be blocked on waiting for some other software to get accepted into the stdlib, but it certainly wound up being a bigger chunk of code than I expected when I started writing it. From rdmurray at bitdance.com Mon Jul 25 21:42:37 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 25 Jul 2011 15:42:37 -0400 Subject: [Email-SIG] header folding Message-ID: <20110725194238.3ABCB2505A8@webabinitio.net> Well, my big project still hasn't kicked off, so I'm still working on email6. I just posted a new blog post: http://www.bitdance.com/blog/2011/07/25_01_email6_pypi_release/ The PyPI release is old news here. The interesting part of the post for this group is the discussion of the new header folding API at the end. Basically, BaseHeader gets a 'wrap' method, and there is a new policy control, 'refold_source' (I'll probably rename it to 'rewrap_source', since I expect to apply it also to message bodies). The policy control has three values: none, long, and all. None means never touch the source, always use it. long means refold a header if any if the source's component lines are longer than max_line_length. 'all' means refold everything. Email5.1 wraps long lines, but leaves short lines alone. Under 'long', this code refolds the whole header if there is a long line in it. I think that is more RFC compliant, and I don't think it will cause any problems if used. The default for refold_source is 'none'. I'm considering this a bug fix, since a stated goal of the email package is to reproduce the source accurately if possible. (Currently the new code still calls Header to do the folding; writing the new folder is my next task.) -- R. David Murray http://www.bitdance.com From stephen at xemacs.org Tue Jul 26 06:03:11 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 26 Jul 2011 13:03:11 +0900 Subject: [Email-SIG] header folding In-Reply-To: <20110725194238.3ABCB2505A8@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> Message-ID: <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > the end. Basically, BaseHeader gets a 'wrap' method, and there is > a new policy control, 'refold_source' (I'll probably rename it to > 'rewrap_source', since I expect to apply it also to message > bodies). This bothers me. Folding and wrapping are two different things. Folding is about invertibly reformatting a single logical line to make machines happy during transmission, what wrapping "does" is not 100% clear to me but it's about making people happy. (I put "does" in quotes because it's not obvious to me that the source of wrapped text necessarily is a single anything, nor that wrapping need be invertible.) I grant that people and many MUAs take a different point of view about header folding, but clearly the RFCs have moved away from placing any importance on presentation aspects toward specifying an invertible transformation exactly. On the other hand, I think that wrapping should place emphasis on presentation. From rdmurray at bitdance.com Tue Jul 26 14:38:26 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 26 Jul 2011 08:38:26 -0400 Subject: [Email-SIG] header folding In-Reply-To: <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110726123827.3EC7FB14005@webabinitio.net> On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > the end. Basically, BaseHeader gets a 'wrap' method, and there is > > a new policy control, 'refold_source' (I'll probably rename it to > > 'rewrap_source', since I expect to apply it also to message > > bodies). > > This bothers me. Folding and wrapping are two different things. > > Folding is about invertibly reformatting a single logical line to make > machines happy during transmission, what wrapping "does" is not 100% > clear to me but it's about making people happy. (I put "does" in > quotes because it's not obvious to me that the source of wrapped text > necessarily is a single anything, nor that wrapping need be > invertible.) > > I grant that people and many MUAs take a different point of view about > header folding, but clearly the RFCs have moved away from placing any > importance on presentation aspects toward specifying an invertible > transformation exactly. On the other hand, I think that wrapping > should place emphasis on presentation. Hmm. Makes sense to me. So you'd rather the method were called "fold" and that refold_source remains the name of the policy control. What's the word for what is done when a text message is made to have a line length of less than 78 by using quoted printable (or base64) encoding? Is that also folding? If there's no existing term in common use, folding would make sense to me. So I have no objection to using 'fold' consistently in the api and code for these operations. Can anyone see a use case for controlling folding of headers separately from folding of message bodies? I haven't thought of one, which is why I'm thinking one policy knob controls both. --David From stephen at xemacs.org Wed Jul 27 09:18:36 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 27 Jul 2011 16:18:36 +0900 Subject: [Email-SIG] header folding In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> Message-ID: <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > Hmm. Makes sense to me. So you'd rather the method were called "fold" > and that refold_source remains the name of the policy control. Yes. > What's the word for what is done when a text message is made to have > a line length of less than 78 by using quoted printable (or base64) > encoding? RFC 2045 discusses "insertion of soft line breaks"; it doesn't mention a term like "folding". "Folding" seems like a good term to me, though. Note that the RFC 2045 definition of quoted-printable says that physical line length MUST be 76 characters or less, including any terminating = but not the CRLF pair that separates lines. > Can anyone see a use case for controlling folding of headers > separately from folding of message bodies? I haven't thought of > one, which is why I'm thinking one policy knob controls both. The RFCs' treatments differ somewhat. RFC 5322 has both a MUST NOT and a SHOULD NOT exceed limit on line length (998 and 78 characters, not including the CRLF, respectively). RFC 2045 quoted-printable has only the MUST NOT limit of 76 (but the difference in limits is not a big deal). It's not clear to me what exactly the policy knob you're talking about is for body text. There is no policy really allowed if quoted- printable is being used. So the policy knob is whether to use quoted-printable to limit physical line length? The only reason I can think of for having separate controls is that many MUAs mishandle quoted-printable in the body text. Patches don't apply, one-time-key URLs in links get broken and fail to be recognized. On the other hand, header-folding rarely has such consequences in my experience. From rdmurray at bitdance.com Wed Jul 27 14:20:38 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 27 Jul 2011 08:20:38 -0400 Subject: [Email-SIG] header folding In-Reply-To: <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110727122039.9033E2506ED@webabinitio.net> On Wed, 27 Jul 2011 16:18:36 +0900, "Stephen J. Turnbull" wrote: > It's not clear to me what exactly the policy knob you're talking about > is for body text. There is no policy really allowed if quoted- > printable is being used. So the policy knob is whether to use > quoted-printable to limit physical line length? Well, I have *not* looked at this in detail yet. By default nothing is changed (refold_source='none'). My preliminary thought was that if refold_source is 'long', and we come across a body that is wider than the RFC limit (or if the application wants to reformat to a different limit), we could reconstruct the body and refold it to the new limit. Perhaps this is not practical/useful; as I say I haven't gotten there yet :) > The only reason I can think of for having separate controls is that > many MUAs mishandle quoted-printable in the body text. Patches don't > apply, one-time-key URLs in links get broken and fail to be > recognized. On the other hand, header-folding rarely has such > consequences in my experience. That's an interesting point. So perhaps I should rename the control 'header_source_refold'. I hate making the name longer, but anything less would be ambiguous, and I've already got other controls with long names :(. On the other hand, we could also provide a separate control for whether or not quoted printable bodies in particular were folded, and consider both controls when deciding what to do with a particular quoted printable body. I favor the latter at the moment. -- R. David Murray http://www.bitdance.com From stephen at xemacs.org Wed Jul 27 16:07:33 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 27 Jul 2011 23:07:33 +0900 Subject: [Email-SIG] header folding In-Reply-To: <20110727122039.9033E2506ED@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp> <20110727122039.9033E2506ED@webabinitio.net> Message-ID: <87fwlrrf4q.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > That's an interesting point. So perhaps I should rename the control > 'header_source_refold'. I don't know have a strong opinion, but I tend to think it's unnecessary. > On the other hand, we could also provide a separate control > for whether or not quoted printable bodies in particular were > folded, If the body is already known to be quoted-printable, you don't really have a choice. Folding lines longer than 76 characters after quoted-printable encoding is required by RFC 2045. Of course you can do more folding than necessary (eg, fold an 85-character line at 35 and 70 characters), but that doesn't seem very useful to me. It seems to me that the policy question (if it exists) is "We have an all-ASCII body with 'long lines'. Shall we encode in quoted-printable only for the purpose of folding the long lines?" From rdmurray at bitdance.com Wed Jul 27 17:34:13 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 27 Jul 2011 11:34:13 -0400 Subject: [Email-SIG] header folding In-Reply-To: <87fwlrrf4q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp> <20110727122039.9033E2506ED@webabinitio.net> <87fwlrrf4q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110727153414.A09C62506ED@webabinitio.net> On Wed, 27 Jul 2011 23:07:33 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > That's an interesting point. So perhaps I should rename the control > > 'header_source_refold'. > > I don't know have a strong opinion, but I tend to think it's > unnecessary. > > > On the other hand, we could also provide a separate control > > for whether or not quoted printable bodies in particular were > > folded, > > If the body is already known to be quoted-printable, you don't really > have a choice. Folding lines longer than 76 characters after > quoted-printable encoding is required by RFC 2045. Of course you can Right, I realized what I said didn't make sense after I hit send :) > do more folding than necessary (eg, fold an 85-character line at 35 > and 70 characters), but that doesn't seem very useful to me. Well, the use case I was thinking of was fixing up non-conformant output from another MUA (quoted printable but with overlong lines). I don't know if such exists in the wild, but I would expect that it does, everything else seems to :) Still it may be a YAGNI, since any such are most likely to be spammers. > It seems to me that the policy question (if it exists) is "We have an > all-ASCII body with 'long lines'. Shall we encode in quoted-printable > only for the purpose of folding the long lines?" Yes, that would be a similar case: we have a body that doesn't conform to the "SHOULD" limit of 78; if refold_source is 'long', should we use QP to fold it? But this question also arises if the application is attaching a text part with lines longer than 78 characters. As you suggested it might be the case that we don't want to QP encode such text. That question, QP encoding only to fold text parts with long lines, thus seems to be a separate policy control (and I do think we want one for it). So if we have 'refold_source' set to 'long', an unencoded text part with long lines would get QP encoded if and only if this new policy setting that we haven't named yet is set to fold such parts using QP. -- R. David Murray http://www.bitdance.com From barry at python.org Tue Jul 26 17:07:03 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 26 Jul 2011 11:07:03 -0400 Subject: [Email-SIG] header folding In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> Message-ID: <20110726110703.2ec7b1e6@resist.wooz.org> On Jul 26, 2011, at 08:38 AM, R. David Murray wrote: >On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" wrote: >> R. David Murray writes: >> >> > the end. Basically, BaseHeader gets a 'wrap' method, and there is >> > a new policy control, 'refold_source' (I'll probably rename it to >> > 'rewrap_source', since I expect to apply it also to message >> > bodies). >> >> This bothers me. Folding and wrapping are two different things. >> >> Folding is about invertibly reformatting a single logical line to make >> machines happy during transmission, what wrapping "does" is not 100% >> clear to me but it's about making people happy. (I put "does" in >> quotes because it's not obvious to me that the source of wrapped text >> necessarily is a single anything, nor that wrapping need be >> invertible.) >> >> I grant that people and many MUAs take a different point of view about >> header folding, but clearly the RFCs have moved away from placing any >> importance on presentation aspects toward specifying an invertible >> transformation exactly. On the other hand, I think that wrapping >> should place emphasis on presentation. > >Hmm. Makes sense to me. So you'd rather the method were called "fold" >and that refold_source remains the name of the policy control. Stephen makes a good one, one I agree with. >What's the word for what is done when a text message is made to have >a line length of less than 78 by using quoted printable (or base64) >encoding? Is that also folding? If there's no existing term in common >use, folding would make sense to me. So I have no objection to using >'fold' consistently in the api and code for these operations. Haven't we used 'splitting' as a term for this, at least internally, in previous versions? That's at least what I think of, and I do think we could have two knows to control the different functionality: - To 'split' a line means to take a line longer than a specified maximum, and make it fit into the maximum line length, splitting at whitespace or other semantic separators. - To 'fill' a header means to take the logical contents of the header and recombine and resplit it so that each line is as close to the maximum line length as possible. My analogy here is Emacs's M-q (fill-paragraph). What then is "folding" or "wrapping"? Maybe no different than the above. >Can anyone see a use case for controlling folding of headers separately >from folding of message bodies? I haven't thought of one, which is why >I'm thinking one policy knob controls both. You might have a message body that contains code, in which case you might want to fill the headers (using the terminology above), but not fill the body. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jul 20 00:45:46 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 19 Jul 2011 18:45:46 -0400 Subject: [Email-SIG] email-6.0.0.a1 In-Reply-To: <20110719212139.D5D732500D5@webabinitio.net> References: <20110719212139.D5D732500D5@webabinitio.net> Message-ID: <20110719184546.4eb8f52a@resist.wooz.org> On Jul 19, 2011, at 05:21 PM, R. David Murray wrote: >OK, so I've released the first iteration of the email6 package on pypi >as email-6.0.0a1. After install you import it as email6. This will >allow anyone curious and/or motivated to test it out under Python 3.2. >I'm especially interested in anyone with a working program that uses >email in 3.2: it should be completely backward compatible, and if it >isn't I want to know ASAP.[*] It'll take some time to digest, but congratulations RDM! You've accomplished an impressive milestone. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Jul 26 17:07:03 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 26 Jul 2011 11:07:03 -0400 Subject: [Email-SIG] header folding In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> Message-ID: <20110726110703.2ec7b1e6@resist.wooz.org> On Jul 26, 2011, at 08:38 AM, R. David Murray wrote: >On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" wrote: >> R. David Murray writes: >> >> > the end. Basically, BaseHeader gets a 'wrap' method, and there is >> > a new policy control, 'refold_source' (I'll probably rename it to >> > 'rewrap_source', since I expect to apply it also to message >> > bodies). >> >> This bothers me. Folding and wrapping are two different things. >> >> Folding is about invertibly reformatting a single logical line to make >> machines happy during transmission, what wrapping "does" is not 100% >> clear to me but it's about making people happy. (I put "does" in >> quotes because it's not obvious to me that the source of wrapped text >> necessarily is a single anything, nor that wrapping need be >> invertible.) >> >> I grant that people and many MUAs take a different point of view about >> header folding, but clearly the RFCs have moved away from placing any >> importance on presentation aspects toward specifying an invertible >> transformation exactly. On the other hand, I think that wrapping >> should place emphasis on presentation. > >Hmm. Makes sense to me. So you'd rather the method were called "fold" >and that refold_source remains the name of the policy control. Stephen makes a good one, one I agree with. >What's the word for what is done when a text message is made to have >a line length of less than 78 by using quoted printable (or base64) >encoding? Is that also folding? If there's no existing term in common >use, folding would make sense to me. So I have no objection to using >'fold' consistently in the api and code for these operations. Haven't we used 'splitting' as a term for this, at least internally, in previous versions? That's at least what I think of, and I do think we could have two knows to control the different functionality: - To 'split' a line means to take a line longer than a specified maximum, and make it fit into the maximum line length, splitting at whitespace or other semantic separators. - To 'fill' a header means to take the logical contents of the header and recombine and resplit it so that each line is as close to the maximum line length as possible. My analogy here is Emacs's M-q (fill-paragraph). What then is "folding" or "wrapping"? Maybe no different than the above. >Can anyone see a use case for controlling folding of headers separately >from folding of message bodies? I haven't thought of one, which is why >I'm thinking one policy knob controls both. You might have a message body that contains code, in which case you might want to fill the headers (using the terminology above), but not fill the body. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From rdmurray at bitdance.com Wed Jul 27 22:56:19 2011 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 27 Jul 2011 16:56:19 -0400 Subject: [Email-SIG] header folding In-Reply-To: <20110726110703.2ec7b1e6@resist.wooz.org> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <20110726110703.2ec7b1e6@resist.wooz.org> Message-ID: <20110727205620.08FCA2506ED@webabinitio.net> On Tue, 26 Jul 2011 11:07:03 -0400, Barry Warsaw (by way of Barry Warsaw ) wrote: > On Jul 26, 2011, at 08:38 AM, R. David Murray wrote: > >What's the word for what is done when a text message is made to have > >a line length of less than 78 by using quoted printable (or base64) > >encoding? Is that also folding? If there's no existing term in common > >use, folding would make sense to me. So I have no objection to using > >'fold' consistently in the api and code for these operations. > > Haven't we used 'splitting' as a term for this, at least internally, in > previous versions? That's at least what I think of, and I do think we could > have two knows to control the different functionality: 'split' and 'wrap' seem to be used somewhat interchangeably in the current code and docs. I'm now consistently using 'fold' in the new code. > - To 'split' a line means to take a line longer than a specified maximum, and > make it fit into the maximum line length, splitting at whitespace or other > semantic separators. My current code doesn't do this anywhere. The old code does. > - To 'fill' a header means to take the logical contents of the header and > recombine and resplit it so that each line is as close to the maximum line > length as possible. My analogy here is Emacs's M-q (fill-paragraph). Neither my current code nor the old code does exactly this anywhere. > What then is "folding" or "wrapping"? Maybe no different than the above. Folding is an RFC term-of-art that implies the specific RFC rules for making sure a semantic unit (header, body) has lines that are shorter than the RFC defined maximum length. Wrapping is much more like your 'filling', but probably a less precise term, as filling does imply maximizing line lengths, while wrapping to my ears does not have that connotation as a requirement. 'refolding', as I've implemented it, consists of taking an existing folded header, unfolding it, and then folding it according to the RFC rules and recommendations. This may or may not put the maximum possible number of characters on a line, depending on whether the header is structured or unstructured and the content of said header. And it may or may not exactly reproduce the original header, depending on how closely the original folder and I agree on our interpretation of the RFC rules :) (Which is why headers are only refolded by explicit request.) So, I agree with Stephen, I think 'folding' is the correct term to use here. > >Can anyone see a use case for controlling folding of headers separately > >from folding of message bodies? I haven't thought of one, which is why > >I'm thinking one policy knob controls both. > > You might have a message body that contains code, in which case you might want > to fill the headers (using the terminology above), but not fill the body. This is similar to the case we've already discussed, about excluding a text body from being QP encoded. I think we don't currently do any paragraph reflow, but it might be an interesting facility to add :) -- R. David Murray http://www.bitdance.com From barry at python.org Thu Jul 28 01:10:42 2011 From: barry at python.org (Barry Warsaw) Date: Wed, 27 Jul 2011 19:10:42 -0400 Subject: [Email-SIG] header folding In-Reply-To: <20110727205620.08FCA2506ED@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <20110726110703.2ec7b1e6@resist.wooz.org> <20110727205620.08FCA2506ED@webabinitio.net> Message-ID: <20110727191042.1268a098@resist.wooz.org> On Jul 27, 2011, at 04:56 PM, R. David Murray wrote: >Wrapping is much more like your 'filling', but probably a less precise >term, as filling does imply maximizing line lengths, while wrapping >to my ears does not have that connotation as a requirement. Is it just the guarantee of maximizing line lengths that's missing? >'refolding', as I've implemented it, consists of taking an existing folded >header, unfolding it, and then folding it according to the RFC rules and >recommendations. This may or may not put the maximum possible number >of characters on a line, depending on whether the header is structured >or unstructured and the content of said header. And it may or may not >exactly reproduce the original header, depending on how closely the >original folder and I agree on our interpretation of the RFC rules :) >(Which is why headers are only refolded by explicit request.) > >So, I agree with Stephen, I think 'folding' is the correct term to >use here. Okay. To me 'folding' is closer to 'splitting', while 'wrapping' is closer 'filling' since in what you describe above, there is an 'unfolding' operation that happens first. Note too that Emacs's filling doesn't guarantee maximal line lengths (i.e. fill-column) either since long words can cause previous lines to be shorter. That seems analogous to your description above. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Thu Jul 28 05:09:53 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 28 Jul 2011 12:09:53 +0900 Subject: [Email-SIG] header folding In-Reply-To: <20110726110703.2ec7b1e6@resist.wooz.org> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <20110726110703.2ec7b1e6@resist.wooz.org> Message-ID: <20016.54017.96798.606472@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > That's at least what I think of, and I do think we could > have two knows to control the different functionality: > > - To 'split' a line means to take a line longer than a specified maximum, and > make it fit into the maximum line length, splitting at whitespace or other > semantic separators. In the case of headers, "folding" is hallowed usage (going back to at least RFC 733), and is very precisely defined by RFC 5322. If we are going to do something non-RFC conformant (yeah, right, we might do that, eh?), "splitting" would be better. If our implementation is intended to be conformant, I think "folding" is preferable both for familiarity and ease of reference ("look it up in RFC 5322"). I think the generalization to bodies is reasonable, although I haven't found any RFC usage of "folding" in that context in a quick look. > - To 'fill' a header means to take the logical contents of the > header and recombine and resplit it so that each line is as close > to the maximum line length as possible. My analogy here is Emacs's > M-q (fill-paragraph). > What then is [...] "wrapping"? Maybe no different than the above. In my dialect, what you describe as "filling" is (at least potentially) far more sophisticated than what I mean by "wrapping". Wrapping moves forward through each line and at the maximum length backtracks to the rightmost break point in the line, breaking there, then continuing the process in the tail line. This could and often in my experience does result in very uneven lines. However, I don't think we're talking about filling here. Filling IMHO should be implemented by the email module, but it should be called explicitly by the client, not imposed internally on the basis of a global policy. Consider the following ugly header (which is somewhat unlikely to actually appear in a real use case, although it could easily result from cut-and-paste into an MUA's to field): To: Amie Cawinski , Ichabod Tallman (there is no trailing whitespace on either line). IMO, there are two plausible fillings (assuming a limit of 78 characters) here: To: Amie Cawinski , Ichabod Tallman and To: Amie Cawinski , Ichabod Tallman of which the second will be uglified by a RFC-5322-conformant processor into: To: Amie Cawinski , Ichabod Tallman (note the extra space after the comma). I personally don't consider either of To: Amie Cawinski , Ichabod Tallman To: Amie Cawinski , Ichabod Tallman plausible as a presentation, but YMMV. So filling (to me) is about presentation, not protocol conformance. Anyway, I don't see how we can justify making *these* choices for the user on the basis of a policy that really is about conservative compliance to a wire protocol standard. For example, I personally do not "fill" 81-character subject headers; it's just too ugly. However, I might want my mail program to conservatively "fold" them, especially for certain correspondents known to be stuck behind weird MTAs or MUAs. > You might have a message body that contains code, in which case you > might want to fill the headers (using the terminology above), but > not fill the body. That's another example of why control for filling has to be flexible (and why IMHO filling should be called explicitly by the client). However, if the receiving MUA is RFC 2045-conformant, the user cannot tell that quoted-printable folding was used. From v+python at g.nevcal.com Fri Jul 29 02:57:12 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 28 Jul 2011 17:57:12 -0700 Subject: [Email-SIG] header folding In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> Message-ID: <4E320568.3080705@g.nevcal.com> On 7/26/2011 5:38 AM, R. David Murray wrote: > What's the word for what is done when a text message is made to have > a line length of less than 78 by using quoted printable (or base64) > encoding? Is that also folding? If there's no existing term in common > use, folding would make sense to me. So I have no objection to using > 'fold' consistently in the api and code for these operations. To me, "fold" means to divide _a_ long line into multiple short lines (less than line length). (Barry calls this split, it seems.) To me, "wrap" means to divide and join as necessary a set of lines (sometimes/often a paragraph) to achieve some number of similar length lines, not to exceed a line length limit, with possibly a shorter one at the end. To me, "fill" means to divide and join as necessary a set of lines (sometimes/often a paragraph) to use as few lines as possible without exceeding a line length limit, usually resulting in a shorter one at the end. (Barry seems to have this same definition.) For all the above, all divisions and joinings happen at white space sequences, and white space sequences are considered irrelevant in composition, and are generally reduced to a single space or newline as a side effect. I think that if these terms are defined in the RFCs, that those definitions should be preferred to mine. Some set of definitions needs to be agreed upon, before sensible communication can be made about what various algorithms should actually do, and what policy settings might be named, and what algorithms they would invoke. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Jul 29 06:40:56 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 29 Jul 2011 13:40:56 +0900 Subject: [Email-SIG] header folding In-Reply-To: <4E320568.3080705@g.nevcal.com> References: <20110725194238.3ABCB2505A8@webabinitio.net> <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp> <20110726123827.3EC7FB14005@webabinitio.net> <4E320568.3080705@g.nevcal.com> Message-ID: <87mxfxpulj.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > To me, "wrap" means to divide and join as necessary a set of lines > (sometimes/often a paragraph) to achieve some number of similar length > lines, not to exceed a line length limit, with possibly a shorter one at > the end. Typically such usage is in contexts where a paragraph is represented as a single physical line, though. Your "set" is not part of "wrap" in my dialect. > I think that if these terms are defined in the RFCs, that those > definitions should be preferred to mine. "Fold" is defined per RFC 5322. The others don't seem to be. I think "fold" should be used for the well-defined operation of header folding (RFC 5322) and also for the well-defined operation of "inserting a soft linebreak" in quoted-printable bodies (RFC 2045). I'm happy with whatever usage others prefer for the other operations. From Axel.Rau at Chaos1.DE Fri Jul 29 13:18:20 2011 From: Axel.Rau at Chaos1.DE (Axel Rau) Date: Fri, 29 Jul 2011 13:18:20 +0200 Subject: [Email-SIG] email-6.0.0.a1 In-Reply-To: <20110719212139.D5D732500D5@webabinitio.net> References: <20110719212139.D5D732500D5@webabinitio.net> Message-ID: Am 19.07.2011 um 23:21 schrieb R. David Murray: > I'm especially interested in anyone with a working program that uses > email in 3.2: it should be completely backward compatible, and if it > isn't I want to know ASAP.[*] I just started testing a SpamCop reporter (800 lines of code). Runs perfect so far. Axel --- PGP-Key:29E99DD6 ? +49 151 2300 9283 ? computing @ chaos claudius From Axel.Rau at Chaos1.DE Sat Jul 30 11:14:03 2011 From: Axel.Rau at Chaos1.DE (Axel Rau) Date: Sat, 30 Jul 2011 11:14:03 +0200 Subject: [Email-SIG] email-6.0.0.a1 In-Reply-To: References: <20110719212139.D5D732500D5@webabinitio.net> Message-ID: Am 29.07.2011 um 13:18 schrieb Axel Rau: > > Am 19.07.2011 um 23:21 schrieb R. David Murray: > >> I'm especially interested in anyone with a working program that uses >> email in 3.2: it should be completely backward compatible, and if it >> isn't I want to know ASAP.[*] > I just started testing a SpamCop reporter (800 lines of code). > Runs perfect so far. 1st problem: ---- Traceback (most recent call last): File "/usr/local/etc/exim/erdb_bt.py", line 834, in reporter.addReport(spam) File "/usr/local/etc/exim/erdb_bt.py", line 227, in addReport self.flushReports() File "/usr/local/etc/exim/erdb_bt.py", line 258, in flushReports smtp.send_message(self.msg) File "/usr/local/lib/python3.2/smtplib.py", line 790, in send_message g.flatten(msg, linesep='\r\n') File "/usr/local/lib/python3.2/email/generator.py", line 99, in flatten self._write(msg) File "/usr/local/lib/python3.2/email/generator.py", line 145, in _write self._dispatch(msg) File "/usr/local/lib/python3.2/email/generator.py", line 171, in _dispatch meth(msg) File "/usr/local/lib/python3.2/email/generator.py", line 232, in _handle_multipart g.flatten(part, unixfrom=False, linesep=self._NL) File "/usr/local/lib/python3.2/email/generator.py", line 99, in flatten self._write(msg) File "/usr/local/lib/python3.2/email/generator.py", line 152, in _write self._write_headers(msg) File "/usr/local/lib/python3.2/email/generator.py", line 373, in _write_headers for h, v in msg._headers: ValueError: too many values to unpack (expected 2) ---- At least seems not to be compatible with my bugs. What am I doing wrong? https://www.chaos1.de/svn-public/repos/network-tools/ERDB/trunk/database/erdb_bt.py Axel --- PGP-Key:29E99DD6 ? +49 151 2300 9283 ? computing @ chaos claudius