[New-bugs-announce] [issue26843] tokenize does not include Other_ID_Start or Other_ID_Continue in identifier
Joshua Landau
report at bugs.python.org
Sun Apr 24 21:58:44 EDT 2016
New submission from Joshua Landau:
This is effectively a continuation of https://bugs.python.org/issue9712.
The line in Lib/tokenize.py
Name = r'\w+'
must be changed to a regular expression that accepts Other_ID_Start at the start and Other_ID_Continue elsewhere. Hence tokenize does not accept '℘·'.
See the reference here:
https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers
I'm unsure whether unicode normalization (aka the `xid` properties) needs to be dealt with too.
Credit to toriningen from http://stackoverflow.com/a/29586366/1763356.
----------
components: Library (Lib)
messages: 264145
nosy: Joshua.Landau
priority: normal
severity: normal
status: open
title: tokenize does not include Other_ID_Start or Other_ID_Continue in identifier
type: behavior
versions: Python 3.5
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26843>
_______________________________________
More information about the New-bugs-announce
mailing list