[Email-SIG] continuation_ws in Generator and Header

Mark Sapiro mark at msapiro.net
Thu Jun 26 17:13:38 CEST 2008


Andi Albrecht wrote:

>There's currently a discussion about the different continuation
>whitespaces used in Generator._write_headers() and the Header class
>regarding issue 1974 (http://bugs.python.org/1974). Barry pointed out
>to move this discussion to this list, so here it is...
>
>The problem can be summarized as follows: The default continuation
>whitespace for long headers in the Header class is the space
>character, but in Generator._write_headers() a tab is used to create a
>Header class from a string. This resulted in at least two bug reports
>(1974, 1645148) where there were problems with some email clients
>(e.g. Outlook) that didn't display the subject as expected. It turned
>out that the problem only occurs when a string is used to set the
>subject header but not when using the Header class directly.


There are a couple of problems here that historically result from
ambiguities in RFC-822. RFC-2822, sec. 2.2.3 clarifies the standard
and is now clear on how folding and unfolding should be done, but the
email library doesn't do it that way, and for historical reasons, many
MUAs don't either.

The email library is pretty good I think about folding 'structured'
headers at higher level breaks. Most problems seem to occur in
Subject: headers which are unstructured and in which commas and
semi-colons are just text and not field separators.

According to RFC-2822, we shouldn't have a continuation-ws character at
all, because we shouldn't be inserting anything other than a <CRLF>.
The real problem is RFC-822 said we could insert <CRLF> followed by
whitespace, and MUAs in an attempt to deal with that tend to remove at
least the first whitespace character following the <CRLF> even though
both RFC-822 and RFC-2822 say that unfolding is accomplished by
removing any <CRLF> (only) that is immediately followed by whitespace.

While I think the patch will help somewhat by providing consistency,
and by not putting <tab> in Subject: headers that doesn't get removed,
I need to look more closely to see if when continuation_ws is <space>
does an extra <space> get inserted.

In any case, I think the goal should be RFC-2822 compliance, especially
since it seems that Outlook and Tbird appear to be going that way.

I may have the urge to look at this after Mailman 2.1.11 is released.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Email-SIG mailing list