[issue25388] tokenizer crash/misbehavior -- heap use-after-free
Terry J. Reedy
report at bugs.python.org
Fri Oct 16 22:36:08 EDT 2015
Terry J. Reedy added the comment:
According to https://docs.python.org/3/reference/lexical_analysis.html#lexical-analysis, the encoding of a sourcefile (in Python 3) defaults to utf-8* and a decoding error is (should be) reported as a SyntaxError. Since b"\x7f\x00\x00\n''s\x01\xfd\n'S" is not invalid as utf-8, I expect a UnicodeDecodeError converted to SyntaxError.
* compile(bytes, filename, mode) defaults to latin1 instead. It has no decoding problem, but quits with "ValueError: source code string cannot contain null bytes". On 2.7, I might expect that as the error.
I expect '''self.assertIn(b"Non-UTF-8", res.err)''' to always fail because error messages are strings, not bytes. That aside, have you ever seen that particular text (as a string) in a SyntaxError message?).
Why do you think the crash is during the tokenizing phase? I could not see anything in the AS report.
----------
nosy: +terry.reedy
versions: +Python 3.5
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25388>
_______________________________________
More information about the Python-bugs-list
mailing list