[issue3353] make built-in tokenizer available via Python C API
Pablo Galindo Salgado
report at bugs.python.org
Wed Jan 27 16:14:20 EST 2021
Pablo Galindo Salgado <pablogsal at gmail.com> added the comment:
Problems that you are going to find:
* The c tokenizer throws syntax errors while the tokenizer module does not. For example:
❯ python -c "1_"
File "<string>", line 1
1_
^
SyntaxError: invalid decimal literal
❯ python -m tokenize <<< "1_"
1,0-1,1: NUMBER '1'
1,1-1,2: NAME '_'
1,2-1,3: NEWLINE '\n'
2,0-2,0: ENDMARKER ''
* The encoding cannot be immediately specified. You need to thread it in many places.
* The readline() function can now return whatever or be whatever, that needs to be handled (better) in the c tokenizer to not crash.
* str/bytes in the c tokenizer.
* The c tokenizer does not get the full line in some cases or is tricky to get the full line.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue3353>
_______________________________________
More information about the Python-bugs-list
mailing list