[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

Thu May 7 13:50:45 EDT 2020

Guido van Rossum <guido at python.org> added the comment:

I don't understand why the traceback module is implicated. It just formats the information in the SyntaxError object, same as the builtin printing for syntax errors. The key difference is always in what line/column/text is put in the SyntaxError object by whichever parser is being used.

In the past there was some misunderstanding about whether column numbers are 0-based (the leftmost column is numbered 0) or 1-based (the leftmost column is numbered 1), and at some point we discovered there was an inconsistency -- certain parts of the code put 0-based offsets in the SyntaxError object and other parts put 1-based offsets.

We then decided that the SyntaxError column offset should be 1-based and changed various bits of code to match. It's however possible that we forgot some. It's also still not clearly documented (e.g. the stdlib docs for SyntaxError don't mention it).

What complicates matters further is that the lowest-level C code in the tokenizer definitely uses 0-based offsets, which means that whenever we create a SyntaxError we have to add 1 to the offset. (You can see this happening if you look at various calls to PyErr_SyntaxLocationObject().)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40546>
_______________________________________