[issue24363] httplib fails to handle semivalid HTTP headers

Sat Nov 28 20:24:17 EST 2015

Martin Panter added the comment:

Since the Python 2 and Python 3 branches are different, two different patches would be needed here. Perhaps they could share common test cases though.

Michael: I presume your proposal is for Python 2. I don’t understand the re.findall() expression; is there a clearer way to do whatever it is trying to do (or failing that, explain it in a comment)? It looks like you are trying to skip over spaces at the start of the first header field name. Also, it seems to drop support for lines folded with tabs rather than spaces.

David: The headers-only mode wouldn’t make much difference because it only affects parsing the “payload”. As far as the email package is concerned, the payload should always be empty when used by HTTP’s parse_headers().

The simplest fix (at least for the Python 3 code) would be to check each line against email.feedparser.headerRE before adding it to the list of lines in the HTTP package. This would prevent the email parser from bailing from the header section early, which is the main problem.

But that does seem like a bad hack and wouldn’t treat the offending line as a folded line, which Cory and David want. So I guess we need to make the email parser more flexible instead. Maybe a private FeedParser._parse_header_lines() method or something, that replaces feed() and close(), and defers most of the processing directly to _parse_headers().

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24363>
_______________________________________