[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Terry J. Reedy
report at bugs.python.org
Wed Mar 14 20:33:15 EDT 2018
Terry J. Reedy <tjreedy at udel.edu> added the comment:
Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers.
This is one of 2 issues about \w being defined too narrowly. I am somewhat arbitrarily closing #1693050 as a duplicate of this (fewer digits ;-).
There are 3 issues about tokenize.tokenize failing on valid identifiers, defined as \w sequences whose first char is an identifier itself (and therefore a start char). In msg313814 of #32987, Serhiy indicates which start and continue identifier characters are matched by \W for re and regex. I am leaving #24194 open as the tokenizer name issue.
----------
stage: needs patch -> test needed
versions: +Python 3.6, Python 3.7, Python 3.8 -Python 2.7, Python 3.3, Python 3.4
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue12731>
_______________________________________
More information about the Python-bugs-list
mailing list