[issue24869] shlex lineno inaccurate with certain inputs

Gareth Rees report at bugs.python.org
Mon Jun 13 12:18:22 EDT 2016


Gareth Rees added the comment:

Just to restate the problem:

The use case is that when emitting an error message for a token, we want to include the number of the line containing the token (or the number of the line where the token started, if the token spans multiple lines, as it might if it's a string containing newlines).

But there is no way to satisfy this use case given the features of the shlex module. In particular, shlex.lineno (which looks as if it ought to help) is actually the line number of the first character that has not yet been consumed by the lexer, and in general this is not the same as the line number of the previous (or the next) token.

I can think of two alternatives that would satisfy the use case:

1. Instead of returning tokens as str objects, return them as instances of a subclass of str that has a property that gives the line number of the first character of the token. (Maybe it should also have properties for the column number of the first character, and the line and column number of the last character too? These properties would support better error messages.)

2. Add new methods that return tuples giving the token and its line number (and possibly column number etc. as in alternative 1).

My preference would be for alternative (1), but I suppose there is a very tiny risk of breaking some code that relied upon get_token returning an instance of str exactly rather than an instance of a subclass of str.

----------
nosy: +Gareth.Rees

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24869>
_______________________________________


More information about the Python-bugs-list mailing list