[issue37764] email.Message.as_string infinite loop

Fri Aug 16 01:55:46 EDT 2019

Abhilash Raj <raj.abhilash1 at gmail.com> added the comment:

You have correctly identified that "=aa" is detected as a encoded word and causes the get_encoded_word to fail.

However, "=?utf-8?q?somevalue?=aa" should ideally get parsed as "somevalueaa" and not "=?utf-8?q?somevalue?=aa". This is because "=?utf-8?q?somevalue?=" is a valid encoded word, it is just not followed by an empty whitespace. 

modified   Lib/email/_header_value_parser.py
@@ -1037,7 +1037,10 @@ def get_encoded_word(value):
         raise errors.HeaderParseError(
             "expected encoded word but found {}".format(value))
     remstr = ''.join(remainder)
-    if len(remstr) > 1 and remstr[0] in hexdigits and remstr[1] in hexdigits:
+    if (len(remstr) > 1 and
+        remstr[0] in hexdigits and
+        remstr[1] in hexdigits and
+        tok.count('?') < 2):
         # The ? after the CTE was followed by an encoded word escape (=XX).
         rest, *remainder = remstr.split('?=', 1)

This can be avoided by checking `?` occurs twice in the `tok`.

The 2nd bug, which needs a better test case, is that if the encoded_word is invalid, you will keep running into infinite loop, which you correctly fixed in your PR. However, the test case you used is more appropriate for the first issue.

You can fix both the issues, for which, you need to add a test case for 2nd issue and fix for the first issue.

Looking into the PR now.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37764>
_______________________________________