[Python-checkins] bpo-32614: Modify re examples to use a raw string to prevent warning (GH-5265)

Fri Feb 2 16:16:36 EST 2018

https://github.com/python/cpython/commit/66771422d0541289d0b1287bc3c28e8b5609f6b4
commit: 66771422d0541289d0b1287bc3c28e8b5609f6b4
branch: master
author: Cheryl Sabella <cheryl.sabella at gmail.com>
committer: Terry Jan Reedy <tjreedy at udel.edu>
date: 2018-02-02T16:16:27-05:00
summary:

bpo-32614: Modify re examples to use a raw string to prevent warning (GH-5265)

Modify RE examples in documentation to use raw strings to prevent DeprecationWarning.
Add text to REGEX HOWTO to highlight the deprecation.  Approved by Serhiy Storchaka.

files:
A Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst
M Doc/howto/regex.rst
M Doc/howto/unicode.rst
M Doc/library/re.rst

diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index 87a6b1aba59f..bdf687ee4551 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
@@ -289,6 +289,8 @@ Putting REs in strings keeps the Python language simpler, but has one
 disadvantage which is the topic of the next section.
 
 
+.. _the-backslash-plague:
+
 The Backslash Plague
 --------------------
 
@@ -327,6 +329,13 @@ backslashes are not handled in any special way in a string literal prefixed with
 while ``"\n"`` is a one-character string containing a newline. Regular
 expressions will often be written in Python code using this raw string notation.
 
+In addition, special escape sequences that are valid in regular expressions,
+but not valid as Python string literals, now result in a
+:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`,
+which means the sequences will be invalid if raw string notation or escaping
+the backslashes isn't used.
+
+
 +-------------------+------------------+
 | Regular String    | Raw string       |
 +===================+==================+
@@ -457,10 +466,16 @@ In actual programs, the most common style is to store the
 Two pattern methods return all of the matches for a pattern.
 :meth:`~re.Pattern.findall` returns a list of matching strings::
 
-   >>> p = re.compile('\d+')
+   >>> p = re.compile(r'\d+')
    >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
    ['12', '11', '10']
 
+The ``r`` prefix, making the literal a raw string literal, is needed in this
+example because escape sequences in a normal "cooked" string literal that are
+not recognized by Python, as opposed to regular expressions, now result in a
+:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`.  See
+:ref:`the-backslash-plague`.
+
 :meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the
 result.  The :meth:`~re.Pattern.finditer` method returns a sequence of
 :ref:`match object <match-objects>` instances as an :term:`iterator`::
@@ -1096,11 +1111,11 @@ following calls::
 The module-level function :func:`re.split` adds the RE to be used as the first
 argument, but is otherwise the same.   ::
 
-   >>> re.split('[\W]+', 'Words, words, words.')
+   >>> re.split(r'[\W]+', 'Words, words, words.')
    ['Words', 'words', 'words', '']
-   >>> re.split('([\W]+)', 'Words, words, words.')
+   >>> re.split(r'([\W]+)', 'Words, words, words.')
    ['Words', ', ', 'words', ', ', 'words', '.', '']
-   >>> re.split('[\W]+', 'Words, words, words.', 1)
+   >>> re.split(r'[\W]+', 'Words, words, words.', 1)
    ['Words', 'words, words.']
 
 
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index d4b8f8d2204a..093f4454af1d 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -463,7 +463,7 @@ The string in this example has the number 57 written in both Thai and
 Arabic numerals::
 
    import re
-   p = re.compile('\d+')
+   p = re.compile(r'\d+')
 
    s = "Over \u0e55\u0e57 57 flavours"
    m = p.search(s)
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index 9b175f4e9675..83ebe7db01ad 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -345,7 +345,7 @@ The special characters are:
 
    This example looks for a word following a hyphen:
 
-      >>> m = re.search('(?<=-)\w+', 'spam-egg')
+      >>> m = re.search(r'(?<=-)\w+', 'spam-egg')
       >>> m.group(0)
       'egg'
 
diff --git a/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst
new file mode 100644
index 000000000000..9e9f3e3a74df
--- /dev/null
+++ b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst
@@ -0,0 +1,3 @@
+Modify RE examples in documentation to use raw strings to prevent
+:exc:`DeprecationWarning` and add text to REGEX HOWTO to highlight the
+deprecation.