From jmhayes at j-o-r-d-a-n.com Sat Oct 13 21:50:13 2012 From: jmhayes at j-o-r-d-a-n.com (Jordan Hayes) Date: Sat, 13 Oct 2012 12:50:13 -0700 Subject: [Email-SIG] continuation_ws in Generator and Header Message-ID: <83006B0BDF5E428FA7E611E1DC5057B9@PAVEPAWS> Just stumbled upon this today: http://mail.python.org/pipermail/email-sig/2008-June/000394.html Mark Sapiro writes: > I may have the urge to look at this after Mailman 2.1.11 is released. Any urge now? :-) Thanks, /jordan From rdmurray at bitdance.com Mon Oct 15 16:42:34 2012 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 15 Oct 2012 10:42:34 -0400 Subject: [Email-SIG] continuation_ws in Generator and Header In-Reply-To: <83006B0BDF5E428FA7E611E1DC5057B9@PAVEPAWS> References: <83006B0BDF5E428FA7E611E1DC5057B9@PAVEPAWS> Message-ID: <20121015144234.8FC1C2500FA@webabinitio.net> On Sat, 13 Oct 2012 12:50:13 -0700, "Jordan Hayes" wrote: > Just stumbled upon this today: > > http://mail.python.org/pipermail/email-sig/2008-June/000394.html > > Mark Sapiro writes: > > > I may have the urge to look at this after Mailman 2.1.11 is released. > > Any urge now? :-) I believe these issues have been dealt with, and that we are now RFC 2822 (and 5322, for that matter) compliant. Except, of course, that we don't do any unfolding (but see next paragraph). Part of this was done in 3.1, more of it in 3.2, and the final bits in 3.3. In 3.3 we even fixed the whitespace issues in our handling RFC 2047, thanks to Ralf Schlatterbeck. The only thing that continuation_ws is used for now is when doing line wrapping on RFC 2047 tokens. (And it now defaults to a space, which may or may not be optimal, but certainly works.) I addition, the new (provisional) email policy in the 3.3 email library has a both an unfolding and a folding algorithm that are supposed to be fully RFC 2822/5322 compliant, including that the folding algorithm implements folding according to the RFC's syntax. That is, it really knows where the higher level syntactic breaks are on a per-header-type basis and folds there preferentially. (Since it is a new algorithm I am sure there are undiscovered bugs[*], which is part of the reason it is provisional code. Another reason is that not all of the RFC's header types have been fleshed out, so I shouldn't really say "fully" yet...) --David [*] not to mention the fact that it is a really really *ugly* algorithm that I need to completely rewrite now that I at least got it working (modulo the undiscovered bugs) and understand the issues involved better. From jmhayes at j-o-r-d-a-n.com Mon Oct 15 17:34:10 2012 From: jmhayes at j-o-r-d-a-n.com (Jordan Hayes) Date: Mon, 15 Oct 2012 08:34:10 -0700 Subject: [Email-SIG] continuation_ws in Generator and Header References: <83006B0BDF5E428FA7E611E1DC5057B9@PAVEPAWS> <20121015144234.8FC1C2500FA@webabinitio.net> Message-ID: > I believe these issues have been dealt with, and that > we are now RFC 2822 (and 5322, for that matter) compliant. Great! Any chance of this making it into a patch for 2.1.15 ...? > The only thing that continuation_ws is used for now is when > doing line wrapping on RFC 2047 tokens. (And it now defaults > to a space, which may or may not be optimal, but certainly works.) I thought the issue is that there should be no character at all: 2822 says to perform folding you simply insert a CRLF *before any whitespace* so that unfolding is simply a matter of removing the CRLF. Maybe an example would help. Here's a header line: Subject: This is overkill to fold, but legal Here's one way to fold it: Subject: This is overkill to fold, but legal Because the space between "overkill" and "to" is valid whitespace, it's also valid as a signal that the second line is a continuation of the first. You don't have to insert another space (or as it does presently, a tab!), you just have to insert CRLF. Likewise on the way out, just remove the CRLF. So I think for Mailman, which can modify the Subject: line, what you need to do is first unfold the line if it's already folded; apply any changes; and then optionally refold, if it's now longer than you'd like it to be. > I addition, the new (provisional) email policy in the 3.3 email > library has a both an unfolding and a folding algorithm that are > supposed to be fully RFC 2822/5322 compliant, including that the > folding algorithm implements folding according to the RFC's syntax. > That is, it really knows where the higher level syntactic breaks are > on a per-header-type basis and folds there preferentially. Sounds great. Thanks, /jordan From barry at python.org Mon Oct 15 17:42:42 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 15 Oct 2012 11:42:42 -0400 Subject: [Email-SIG] continuation_ws in Generator and Header In-Reply-To: References: <83006B0BDF5E428FA7E611E1DC5057B9@PAVEPAWS> <20121015144234.8FC1C2500FA@webabinitio.net> Message-ID: <20121015114242.1f618b13@resist.wooz.org> On Oct 15, 2012, at 08:34 AM, Jordan Hayes wrote: >> I believe these issues have been dealt with, and that >> we are now RFC 2822 (and 5322, for that matter) compliant. > >Great! Any chance of this making it into a patch for 2.1.15 ...? That's Mailman 2.1.15, which is already out, so you probably mean 2.1.16. But Mailman 2.1 pretty much uses whatever is available in Python 2 - we've been down the road before of providing a separate email package, and I think that's problematic. FWIW, I would *dearly* love Mailman 3 to be a Python 3 project, and even require Python 3.3 so we could take advantage of all the nice email policy stuff right out of the box. I can't currently do that because enough of our dependencies haven't yet been ported (ping me if you want to help with that :). Cheers, -Barry From rdmurray at bitdance.com Mon Oct 15 18:06:54 2012 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 15 Oct 2012 12:06:54 -0400 Subject: [Email-SIG] continuation_ws in Generator and Header In-Reply-To: References: <83006B0BDF5E428FA7E611E1DC5057B9@PAVEPAWS> <20121015144234.8FC1C2500FA@webabinitio.net> Message-ID: <20121015160655.508BD2500FA@webabinitio.net> On Mon, 15 Oct 2012 08:34:10 -0700, "Jordan Hayes" wrote: > > I believe these issues have been dealt with, and that > > we are now RFC 2822 (and 5322, for that matter) compliant. > > Great! Any chance of this making it into a patch for 2.1.15 ...? > > > The only thing that continuation_ws is used for now is when > > doing line wrapping on RFC 2047 tokens. (And it now defaults > > to a space, which may or may not be optimal, but certainly works.) > > I thought the issue is that there should be no character at all: 2822 > says to perform folding you simply insert a CRLF *before any whitespace* > so that unfolding is simply a matter of removing the CRLF. > > Maybe an example would help. Here's a header line: > > Subject: This is overkill to fold, but legal > > Here's one way to fold it: > > Subject: This is overkill > to fold, but legal > > Because the space between "overkill" and "to" is valid whitespace, it's > also valid as a signal that the second line is a continuation of the > first. You don't have to insert another space (or as it does presently, > a tab!), you just have to insert CRLF. Likewise on the way out, just > remove the CRLF. Right, and the Python3 email package does exactly that. I don't remember exactly which version I made which fixes in, but I remember Barry changed tab to space in 3.1, and I just checked and it looks like I made the fix that preserves the existing whitespace instead of using continuation_ws in 3.2 (I rewrote and simplified the old wrapping algorithm). The place continuation_ws is still used is when you feed a non-ASCII string that would result in a line longer than the line length to Header.append. In that case, RFC 2047 instructs us to break up the encoded word, inserting whitespace between the pieces, such that no line containing encoded words is longer than 76 characters (without the CR/LF). So, this is the one place where we are *required* to insert whitespace, which we are then required to remove when decoding. And we still do this; but as I said, that's the only place left where we actually use the value of continuation_ws. Everywhere else we just insert CR/LF as needed in front of existing whitespace[*]. --David [*] There's a caveat in the comments about this: because the pre-3.3 code had no real idea of the syntax of the headers, it may theoretically chose to insert a CR/LF in front of whitespace that is not actually legal folding whitespace. This is, however, very unlikely, since there is very little (if any?) whitespace that is *not* legal folding whitespace.