[New-bugs-announce] [issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

Bob Kline report at bugs.python.org
Sat Aug 31 12:15:28 EDT 2019


New submission from Bob Kline <bkline at rksystems.com>:

-    UNWANTED = re.compile("""['".,?!:;()[\]{}<>\u201C\u201D\u00A1\u00BF]+""")
+    UNWANTED = re.compile("""['".,?!:;()[\]{}<>\\u201C\\u201D\\u00A1\\u00BF]+""")

The non-ASCII characters in the original string are perfectly legitimate str characters, using valid standard escapes recognized and handled by the Python parser. It is unnecessary to lengthen the string argument passed to re.compile() and defer the conversion of the doubled escapes for the regular expression engine to handle.

----------
components: 2to3 (2.x to 3.x conversion tool)
messages: 350922
nosy: bkline
priority: normal
severity: normal
status: open
title: 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37996>
_______________________________________


More information about the New-bugs-announce mailing list