[issue12675] tokenize module happily tokenizes code with syntax errors

Thu Aug 4 14:21:06 CEST 2011

Gareth Rees <gdr at garethrees.org> added the comment:

Having looked at some of the consumers of the tokenize module, I don't think my proposed solutions will work.

It seems to be the case that the resynchronization behaviour of tokenize.py is important for consumers that are using it to transform arbitrary Python source code (like 2to3.py). These consumers are relying on the "roundtrip" property that X == untokenize(tokenize(X)). So solution (1) is necessary for the handling of tokenization errors.

Also, that fact that TokenInfo is a 5-tuple is relied on in some places (e.g. lib2to3/patcomp.py line 38), so it can't be extended. And there are consumers (though none in the standard library) that are relying on type=ERRORTOKEN being the way to detect errors in a tokenization stream. So I can't overload that field of the structure.

Any good ideas for how to record the cause of error without breaking backwards compatibility?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12675>
_______________________________________