[Email-SIG] Fixing the header wrapping algorithm

R. David Murray rdmurray at bitdance.com
Sun Apr 10 22:46:33 CEST 2011


I just posted a significant patch on issue 11492.  After banging my
head on the existing header folding algorithm and watching the if/else
cases proliferate and break other things, I decided to try a rewrite.
It is 70 lines shorter but passes all the tests plus the new ones I
posted to the bug.  And some additional ones.

In the new algorithm I'm changing the interpretation of RFC2822
that it implements.  The old algorithm breaks on the 'splitchars'
unconditionally, introducing whitespace if there isn't whitespace
there already.  This seems wrong to me.  When 2822 talks about higher
level syntactic breaks, I believe it means only such breaks where FWS
is present.  So the new algorithm breaks only where there is at least
one tab or space, but prefers to break after the splitchars when such
are followed by a tab or space.

We still aren't doing it "right", because we aren't paying attention to
the real syntax of structured headers, and we might inadvertently break
at whitespace that is not legitimate FWS.  Those case should be pretty
darn rare, though, and they old algorithm could make the same mistake.

The patch adjusts a few tests that were checking the old line breaking
that was failing to break long lines even though they contained whitespace
when they also contained splitchars.  There is even a comment in one of
them that says that it is wrong.

Since this fixes bugs and improves RFC compliance, I plan to apply
it to 3.2.  (As noted in the issue, 3.1 has a test failure I don't
understand...really I ought to figure it out, and perhaps I will before
the time comes that I can actually apply the patch.)

diffstat says the header.py portion of the patch is 107 lines added and
178 deleted, so it is a non trivial change.  Reviews welcomed.

--David


More information about the Email-SIG mailing list