[issue26686] email.parser stops parsing headers too soon when given a defective message.

Wed Aug 10 09:42:23 EDT 2016

R. David Murray added the comment:

I would prefer if we did lookahead to see if the subsequent line looks like a header.  It's more complicated to do that, of course, and could still lead to false negatives.  However, I think that would probably retain enough backward compatibility to be acceptable.  It would also be sensible to make this a policy switch, and as I said elsewhere I'm fine with changing the defaults of the http policy even in 3.5.  (The downside of *that* is that I'm sure there are bugs hiding in the new header parsing code, so actually using the http policy to parse http headers will doubtless "allow" us to find some of them.)

Even more complicated, but a better heuristic: look ahead to the next blank line, up to some limit (5 lines?), and if you do find something that looks like a header, also make sure that none of the intermediate lines look like a MIME boundary.   That still leaves the question of what to do with a source text that has non-header lines up to the next blank line (this applies to one line lookahead as well).  Maybe see if there is more text after the blank line and if so assume the non-header is part of the header, otherwise not?

Regardless, lookahead may be difficult to code.  So an alternative that uses your approach, but triggered by a policy setting on http, would be acceptable backward compatibility wise.  If we want to we could even make an internal http policy that is compat32 plus this new flag.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26686>
_______________________________________