[Python-checkins] bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)
INADA Naoki
webhook-mailer at python.org
Sat Oct 14 01:22:06 EDT 2017
https://github.com/python/cpython/commit/7060380d577690a40ebc201c0725076349e977cd
commit: 7060380d577690a40ebc201c0725076349e977cd
branch: 3.6
author: INADA Naoki <methane at users.noreply.github.com>
committer: GitHub <noreply at github.com>
date: 2017-10-14T14:21:59+09:00
summary:
bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)
Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters.
Straightforward solution for this is using `IGNORECASE | ASCII` flag.
But users may subclass `Template` and override only `idpattern`. So we want to
avoid changing `Template.flags`.
So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.
(cherry picked from commit b22273ec5d1992b0cbe078b887427ae9977dfb78)
files:
A Misc/NEWS.d/next/Library/2017-10-12-02-47-16.bpo-31672.DaOkVd.rst
M Doc/library/string.rst
M Lib/string.py
M Lib/test/test_string.py
diff --git a/Doc/library/string.rst b/Doc/library/string.rst
index a0977b64613..7a9fcc38bbd 100644
--- a/Doc/library/string.rst
+++ b/Doc/library/string.rst
@@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:
* *idpattern* -- This is the regular expression describing the pattern for
non-braced placeholders (the braces will be added automatically as
- appropriate). The default value is the regular expression
- ``[_a-z][_a-z0-9]*``.
+ appropriate). The default value is the regular expression
+ ``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.
+
+ .. note::
+
+ Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
+ with some non-ASCII characters. That's why we use local ``-i`` flag here.
+
+ While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
+ you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
+ subclassing.
+
* *flags* -- The regular expression flags that will be applied when compiling
the regular expression used for recognizing substitutions. The default value
diff --git a/Lib/string.py b/Lib/string.py
index c9020076437..670c1951a8a 100644
--- a/Lib/string.py
+++ b/Lib/string.py
@@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions."""
delimiter = '$'
- idpattern = r'[_a-z][_a-z0-9]*'
+ # r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
+ # but without ASCII flag. We can't add re.ASCII to flags because of
+ # backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
+ # See https://bugs.python.org/issue31672
+ idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
flags = _re.IGNORECASE
def __init__(self, template):
diff --git a/Lib/test/test_string.py b/Lib/test/test_string.py
index 70439f85c89..8db23e76c1c 100644
--- a/Lib/test/test_string.py
+++ b/Lib/test/test_string.py
@@ -271,6 +271,12 @@ def test_invalid_placeholders(self):
raises(ValueError, s.substitute, dict(who='tim'))
s = Template('$who likes $100')
raises(ValueError, s.substitute, dict(who='tim'))
+ # Template.idpattern should match to only ASCII characters.
+ # https://bugs.python.org/issue31672
+ s = Template("$who likes $\u0131") # (DOTLESS I)
+ raises(ValueError, s.substitute, dict(who='tim'))
+ s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
+ raises(ValueError, s.substitute, dict(who='tim'))
def test_idpattern_override(self):
class PathPattern(Template):
diff --git a/Misc/NEWS.d/next/Library/2017-10-12-02-47-16.bpo-31672.DaOkVd.rst b/Misc/NEWS.d/next/Library/2017-10-12-02-47-16.bpo-31672.DaOkVd.rst
new file mode 100644
index 00000000000..b8de1f3b1db
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2017-10-12-02-47-16.bpo-31672.DaOkVd.rst
@@ -0,0 +1,2 @@
+``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
+it uses ``-i`` regular expression local flag to avoid non-ASCII characters.
More information about the Python-checkins
mailing list