[issue35547] email.parser / email.policy does correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

Thu Dec 20 19:52:03 EST 2018

Martijn Pieters <mj at python.org> added the comment:

Right, re-educating myself on the MIME RFCs, and found https://bugs.python.org/issue1372770 where the same issue is being discussed for previous incarnations of the email library.

Removing the FWS after CRLF is the wrong thing to do, **unless** RFC2047 separating encoded-word tokens. The work-around regex is a bit more complicated, but ideally the EW handling should use a specialist FWS token to delimit encoded-word sections that renders to '' as is done in unstructured headers, but everywhere. Because in practice, there are email clients out there that use EW in structured headers, regardless. 

Regex to work around this 

# crude CRLF-FWS-between-encoded-word matching
value = re.sub(r'(?<=\?=(\r\n|\n|\r))([\t ]+)(?==\?)', '', value)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35547>
_______________________________________