[Python-checkins] peps: Update PEP 414 to record the exclusion of raw Unicode literals from the scope

Wed Jun 20 13:46:11 CEST 2012

http://hg.python.org/peps/rev/f565858c556a
changeset:   4469:f565858c556a
user:        Nick Coghlan <ncoghlan at gmail.com>
date:        Wed Jun 20 21:45:58 2012 +1000
summary:
  Update PEP 414 to record the exclusion of raw Unicode literals from the scope

files:
  pep-0403.txt |   2 ++
  pep-0414.txt |  38 +++++++++++++++++++++++++++++++++-----
  2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/pep-0403.txt b/pep-0403.txt
--- a/pep-0403.txt
+++ b/pep-0403.txt
@@ -90,6 +90,8 @@
     def adder(i):
         return lambda x: x + i
 
+If a list comprehension grows to the 
+
 
 Proposal
 ========
diff --git a/pep-0414.txt b/pep-0414.txt
--- a/pep-0414.txt
+++ b/pep-0414.txt
@@ -40,7 +40,7 @@
 Specifically, the Python 3 definition for string literal prefixes will be
 expanded to allow::
 
-    "u" | "U" | "ur" | "UR" | "Ur" | "uR"
+    "u" | "U"
 
 in addition to the currently supported::
 
@@ -61,13 +61,40 @@
     U'''text'''
     U"""text"""
 
-Combination of the unicode prefix with the raw string prefix will also be
-supported, just as it was in Python 2.
-
 No changes are proposed to Python 3's actual Unicode handling, only to the
 acceptable forms for string literals.
 
 
+Exclusion of "Raw" Unicode Literals
+===================================
+
+Python 2 supports a concept of "raw" Unicode literals that don't meet the
+convential definition of a raw string: ``\uXXXX`` and ``\UXXXXXXXX`` escape
+sequences are still processed by the compiler and converted to the
+appropriate Unicode code points when creating the associated Unicode objects.
+
+Python 3 has no corresponding concept - the compiler performs *no*
+preprocessing of the contents of raw string literals. This matches the
+behaviour of 8-bit raw string literals in Python 2.
+
+Since such strings are rarely used and would be interpreted differently in
+Python 3 if permitted, it was decided that leaving them out entirely was
+a better choice. Code which uses them will thus still fail immediately on
+Python 3 (with a Syntax Error), rather than potentially producing different
+output.
+
+To get equivalent behaviour that will run on both Python 2 and Python 3,
+either an ordinary Unicode literal can be used (with appropriate additional
+escaping within the string), or else string concatenation or string
+formatting can be combine the raw portions of the string with those that
+require the use of Unicode escape sequences.
+
+Note that when using ``from __future__ import unicode_literals`` in Python 2,
+the nominally "raw" Unicode string literals will process ``\uXXXX`` and
+``\UXXXXXXXX`` escape sequences, just like Python 2 strings explicitly marked
+with the "raw Unicode" prefix.
+
+
 Author's Note
 =============
 
@@ -318,7 +345,8 @@
 how to use them properly".
 
 These responses are a case of completely missing the point of what people are
-complaining about. The feedback that resulted in this PEP isn't due to people complaining that ports aren't possible. Instead, the feedback is coming from
+complaining about. The feedback that resulted in this PEP isn't due to people
+complaining that ports aren't possible. Instead, the feedback is coming from
 people that have succesfully *completed* ports and are objecting that they
 found the experience thoroughly *unpleasant* for the class of application that
 they needed to port (specifically, Unicode aware web frameworks and support

-- 
Repository URL: http://hg.python.org/peps