[issue36911] ast.parse outputs ast.Strs which do not differentiate between the ASCII codepoint 12 (literal new line) and the ASCII codepoints 134 and 156 ("\n")

Matthias Bussonnier report at bugs.python.org
Mon May 13 22:54:30 EDT 2019


Matthias Bussonnier <bussonniermatthias at gmail.com> added the comment:

I believe this one is even before the ast, in the tokenizer. Though the AST is also doing some normalisation in identifiers (“ε” U+03B5 Greek Small Letter Epsilon Unicode Character , and “ϵ” U+03F5 Greek Lunate Epsilon Symbol Unicode Character get normalized to the same for example, which is problematic as the look different, but end up being same identifier).

I'd be interested in an opt-in flag to not do this normalisation (I have a prototype with this for the identifier normalisation in ast, but I have not looked at the tokenizer), which might be useful for some linting tools.

----------
nosy: +mbussonn

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36911>
_______________________________________


More information about the Python-bugs-list mailing list