[issue16173] Wrong offset on SyntaxError when identifier contains non-ascii characters

Baptiste Mispelon report at bugs.python.org
Tue Oct 9 12:23:53 CEST 2012


New submission from Baptiste Mispelon:

When a syntax error happens, the exception that gets printed has an extra line with a caret that helps locate the error.

If the line also contains an identifier with non-ascii characters, then this caret is misaligned (too far on the right).

I've investigated briefly and it seems that the offset attribute on the SyntaxError has a wrong value:

    for varname in ['a', 'é', '蟒']: # 1, 2 and 3 bytes
        try:
            exec("%s$" % varname) # SyntaxError
        except SyntaxError as e:
            print(e.offset) # should be 2

The example above prints 2, 3, and 4 when it should be printing 2 every time.

It seems that the calculation of the offset takes into account the size in bytes instead of the size in characters.

I've tested and reproduced the issue on 3.2.2 and on a recent clone of the mercurial repository (dd5e98ddcd39).

----------
components: Interpreter Core
messages: 172470
nosy: bmispelon
priority: normal
severity: normal
status: open
title: Wrong offset on SyntaxError when identifier contains non-ascii characters
type: behavior
versions: Python 3.2, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16173>
_______________________________________


More information about the Python-bugs-list mailing list