[Email-SIG] continuation_ws in Generator and Header

R. David Murray rdmurray at bitdance.com
Mon Oct 15 18:06:54 CEST 2012


On Mon, 15 Oct 2012 08:34:10 -0700, "Jordan Hayes" <jmhayes at j-o-r-d-a-n.com> wrote:
> > I believe these issues have been dealt with, and that
> > we are now RFC 2822 (and 5322, for that matter) compliant.
> 
> Great!  Any chance of this making it into a patch for 2.1.15 ...?
> 
> > The only thing that continuation_ws is used for now is when
> > doing line wrapping on RFC 2047 tokens.  (And it now defaults
> > to a space, which may or may not be optimal, but certainly works.)
> 
> I thought the issue is that there should be no character at all: 2822 
> says to perform folding you simply insert a CRLF *before any whitespace* 
> so that unfolding is simply a matter of removing the CRLF.
> 
> Maybe an example would help.  Here's a header line:
> 
> Subject: This is overkill to fold, but legal
> 
> Here's one way to fold it:
> 
> Subject: This is overkill
>  to fold, but legal
> 
> Because the space between "overkill" and "to" is valid whitespace, it's 
> also valid as a signal that the second line is a continuation of the 
> first.  You don't have to insert another space (or as it does presently, 
> a tab!), you just have to insert CRLF.  Likewise on the way out, just 
> remove the CRLF.

Right, and the Python3 email package does exactly that.  I don't remember
exactly which version I made which fixes in, but I remember Barry changed
tab to space in 3.1, and I just checked and it looks like I made the fix
that preserves the existing whitespace instead of using continuation_ws
in 3.2 (I rewrote and simplified the old wrapping algorithm).

The place continuation_ws is still used is when you feed a non-ASCII
string that would result in a line longer than the line length to
Header.append.  In that case, RFC 2047 instructs us to break up the
encoded word, inserting whitespace between the pieces, such that no line
containing encoded words is longer than 76 characters (without the CR/LF).
So, this is the one place where we are *required* to insert whitespace,
which we are then required to remove when decoding.  And we still do
this; but as I said, that's the only place left where we actually use
the value of continuation_ws.  Everywhere else we just insert CR/LF as
needed in front of existing whitespace[*].

--David

[*] There's a caveat in the comments about this: because the pre-3.3 code
    had no real idea of the syntax of the headers, it may theoretically
    chose to insert a CR/LF in front of whitespace that is not actually
    legal folding whitespace.  This is, however, very unlikely, since
    there is very little (if any?) whitespace that is *not* legal
    folding whitespace.


More information about the Email-SIG mailing list