[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

Mon Apr 23 03:01:28 EDT 2018

Serhiy Storchaka <storchaka+cpython at gmail.com> added the comment:

> - the built-in AST increasingly modifies the tree before presenting it to user
>   code (constant folding moved to the AST in Python 3.7);

These modification are applied only before bytecodecode generation. The AST presented to user is not modified.

> - the built-in tokenize.py can only be used to parse Python 3.7+ code;

Is this a problem? 2.7 is a dead Lib/lib2to3/pgen2/tokenize.pyend, its support will be ended in less than 2 years. Even 3.6 will be moved to a security only fixes stage short time after releasing 3.8.

I'm in favor of updating Lib/lib2to3/pgen2/tokenize.py, but I don't understand why Lib/tokenize.py should parse 2.7.

I'm in favor of reimplementing pgen in Python if this will simplify the code and the building process. Python code is simpler than C code, this code is not performance critical, and in any case we need an external Python when modify grammar of bytecode.

See also issue30455 where I try to get rid of duplications by generating all tokens-related data and code from a single source (token.py or external text file).

For what purposes the CST is needed besides 2to3? I know only that it could help to determine the correct position in docstrings in doctests and similar tools which need to process docstrings and report errors. This is not possible with AST due to inlined '\n', escaped newlines, and string literals concatenation. Changes in 3.7 made this even worse (see issue32911).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33337>
_______________________________________