[issue35547] email.parser / email.policy does correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

Thu Dec 20 20:11:19 EST 2018

Martijn Pieters <mj at python.org> added the comment:

That regex is incorrect, I should not post untested code from a mobile phone. Corrected workaround with more context:

import re
from email.policy import EmailPolicy

class UnfoldingEncodedStringHeaderPolicy(EmailPolicy):
    def header_fetch_parse(self, name, value):
        # remove any leading whitespace from header lines
        # that separates apparent encoded-word token before further processing 
        # using somewhat crude CRLF-FWS-between-encoded-word matching
        value = re.sub(r'(?<=\?=)((?:\r\n|[\r\n])[\t ]+)(?==\?)', '', value)
        return super().header_fetch_parse(name, value)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35547>
_______________________________________