[Email-SIG] header folding

Thu Jul 28 05:09:53 CEST 2011

Barry Warsaw <barry at python.org> writes:

 > That's at least what I think of, and I do think we could
 > have two knows to control the different functionality:
 > 
 > - To 'split' a line means to take a line longer than a specified maximum, and
 >   make it fit into the maximum line length, splitting at whitespace or other
 >   semantic separators.

In the case of headers, "folding" is hallowed usage (going back to at
least RFC 733), and is very precisely defined by RFC 5322.  If we are
going to do something non-RFC conformant (yeah, right, we might do
that, eh?), "splitting" would be better.  If our implementation is
intended to be conformant, I think "folding" is preferable both for
familiarity and ease of reference ("look it up in RFC 5322").

I think the generalization to bodies is reasonable, although I haven't
found any RFC usage of "folding" in that context in a quick look.

 > - To 'fill' a header means to take the logical contents of the
 > header and recombine and resplit it so that each line is as close
 > to the maximum line length as possible.  My analogy here is Emacs's
 > M-q (fill-paragraph).

 > What then is [...] "wrapping"?  Maybe no different than the above.

In my dialect, what you describe as "filling" is (at least
potentially) far more sophisticated than what I mean by "wrapping".
Wrapping moves forward through each line and at the maximum length
backtracks to the rightmost break point in the line, breaking there,
then continuing the process in the tail line.  This could and often in
my experience does result in very uneven lines.

However, I don't think we're talking about filling here.  Filling IMHO
should be implemented by the email module, but it should be called
explicitly by the client, not imposed internally on the basis of a
global policy.

Consider the following ugly header (which is somewhat unlikely to
actually appear in a real use case, although it could easily result
from cut-and-paste into an MUA's to field):

To: Amie Cawinski <abc at abc.org>, Ichabod
 Tallman <imt at cow.org>

(there is no trailing whitespace on either line).  IMO, there are two
plausible fillings (assuming a limit of 78 characters) here:

To: Amie Cawinski <abc at abc.org>, Ichabod Tallman <imt at cow.org>

and

To: Amie Cawinski <abc at abc.org>,
    Ichabod Tallman <imt at cow.org>

of which the second will be uglified by a RFC-5322-conformant
processor into:

To: Amie Cawinski <abc at abc.org>,    Ichabod Tallman <imt at cow.org>

(note the extra space after the comma).  I personally don't consider
either of

To: Amie Cawinski <abc at abc.org>,
 Ichabod Tallman <imt at cow.org>

To: Amie Cawinski <abc at abc.org>,
<TAB>Ichabod Tallman <imt at cow.org>

plausible as a presentation, but YMMV.  So filling (to me) is about
presentation, not protocol conformance.

Anyway, I don't see how we can justify making *these* choices for the
user on the basis of a policy that really is about conservative
compliance to a wire protocol standard.  For example, I personally do
not "fill" 81-character subject headers; it's just too ugly.  However,
I might want my mail program to conservatively "fold" them, especially
for certain correspondents known to be stuck behind weird MTAs or MUAs.

 > You might have a message body that contains code, in which case you
 > might want to fill the headers (using the terminology above), but
 > not fill the body.

That's another example of why control for filling has to be flexible
(and why IMHO filling should be called explicitly by the client).

However, if the receiving MUA is RFC 2045-conformant, the user cannot
tell that quoted-printable folding was used.