[Python-checkins] r53882 - peps/trunk/pep-0000.txt peps/trunk/pep-0358.txt peps/trunk/pep-3112.txt

Sat Feb 24 06:42:55 CET 2007

Author: guido.van.rossum
Date: Sat Feb 24 06:42:52 2007
New Revision: 53882

Added:
   peps/trunk/pep-3112.txt   (contents, props changed)
Modified:
   peps/trunk/pep-0000.txt
   peps/trunk/pep-0358.txt
Log:
Add the bytes literal PEP, by Jason Orendorff.
And accept it!


Modified: peps/trunk/pep-0000.txt
==============================================================================

--- peps/trunk/pep-0000.txt	(original)
+++ peps/trunk/pep-0000.txt	Sat Feb 24 06:42:52 2007
@@ -83,6 +83,7 @@
  SA 3109  Raising Exceptions in Python 3000            Winter
  SA 3110  Catching Exceptions in Python 3000           Winter
  SA 3111  Simple input built-in in Python 3000         Roberge
+ SA 3112  Bytes literals in Python 3000                Wouters
 
  Open PEPs (under consideration)
 
@@ -455,6 +456,7 @@
  SA 3109  Raising Exceptions in Python 3000            Winter
  SA 3110  Catching Exceptions in Python 3000           Winter
  SA 3111  Simple input built-in in Python 3000         Roberge
+ SA 3112  Bytes literals in Python 3000                Wouters
 
 Key
 

Modified: peps/trunk/pep-0358.txt
==============================================================================
--- peps/trunk/pep-0358.txt	(original)
+++ peps/trunk/pep-0358.txt	Sat Feb 24 06:42:52 2007
@@ -181,6 +181,9 @@
       be added to language to allow objects to be converted into byte
       arrays.  This decision is out of scope.
 
+    * A bytes literal of the form b"..." is also proposed.  This is
+      the subject of PEP 3112.
+
 
 Open Issues
 
@@ -195,14 +198,7 @@
 
     * Should all those list methods really be implemented?
 
-    * There is growing support for a b"..." literal.  Here's a brief
-      spec.  Each invocation of b"..." produces a new bytes object
-      (this is unlike "..." but similar to [...] and {...}).  Inside
-      the literal, only ASCII characters and non-Unicode backslash
-      escapes are allowed; non-ASCII characters not specified as
-      escapes are rejected by the compiler regardless of the source
-      encoding.  The resulting object's value is the same as if
-      bytes(map(ord, "...")) were called.
+    * Now that a b"..." literal exists, shouldn't repr() return one?
 
     * A case could be made for supporting .ljust(), .rjust(),
       .center() with a mandatory second argument.

Added: peps/trunk/pep-3112.txt
==============================================================================
--- (empty file)
+++ peps/trunk/pep-3112.txt	Sat Feb 24 06:42:52 2007
@@ -0,0 +1,159 @@
+PEP:
+Title: Bytes literals in Python 3000
+Version: $Revision$
+Last-Modified: $Date$
+Author: Jason Orendorff <jason.orendorff at gmail.com>
+Status: Accepted
+Type:
+Content-Type: text/x-rst
+Requires: 358
+Created: 23-Feb-2007
+Python-Version: 3.0
+Post-History: 23-Feb-2007
+
+
+Abstract
+========
+
+This PEP proposes a literal syntax for the ``bytes`` objects
+introduced in PEP 358.  The purpose is to provide a convenient way to
+spell ASCII strings and arbitrary binary data.
+
+
+Motivation
+==========
+
+Existing spellings of an ASCII string in Python 3000 include:
+
+    bytes('Hello world', 'ascii')
+    'Hello world'.encode('ascii')
+
+The proposed syntax is:
+
+    b'Hello world'
+
+Existing spellings of an 8-bit binary sequence in Python 3000 include:
+
+    bytes([0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00])
+    bytes('\x7fELF\x01\x01\x01\0', 'latin-1')
+    '7f454c4601010100'.decode('hex')
+
+The proposed syntax is:
+
+    b'\x7f\x45\x4c\x46\x01\x01\x01\x00'
+    b'\x7fELF\x01\x01\x01\0'
+
+In both cases, the advantages of the new syntax are brevity, some
+small efficiency gain, and the detection of encoding errors at compile
+time rather than at runtime.  The brevity benefit is especially felt
+when using the string-like methods of bytes objects:
+
+    lines = bdata.split(bytes('\n', 'ascii'))  # existing syntax
+    lines = bdata.split(b'\n')  # proposed syntax
+
+And when converting code from Python 2.x to Python 3000:
+
+    sok.send('EXIT\r\n')  # Python 2.x
+    sok.send('EXIT\r\n'.encode('ascii'))  # Python 3000 existing
+    sok.send(b'EXIT\r\n')  # proposed
+
+
+Grammar Changes
+===============
+
+The proposed syntax is an extension of the existing string
+syntax. [#stringliterals]_
+
+The new syntax for strings, including the new bytes literal, is:
+
+  stringliteral: [stringprefix] (shortstring | longstring)
+  stringprefix: "b" | "r" | "br" | "B" | "R" | "BR" | "Br" | "bR"
+  shortstring: "'" shortstringitem* "'" | '"' shortstringitem* '"'
+  longstring: "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
+  shortstringitem: shortstringchar | escapeseq
+  longstringitem: longstringchar | escapeseq
+  shortstringchar:
+    <any source character except "\" or newline or the quote>
+  longstringchar: <any source character except "\">
+  escapeseq: "\" NL
+    | "\\" | "\'" | '\"'
+    | "\a" | "\b" | "\f" | "\n" | "\r" | "\t" | "\v"
+    | "\ooo" | "\xhh" 
+    | "\uxxxx" | "\Uxxxxxxxx" | "\N{name}"
+
+The following additional restrictions apply only to bytes literals
+(``stringliteral`` tokens with ``b`` or ``B`` in the
+``stringprefix``):
+
+- Each ``shortstringchar`` or ``longstringchar`` must be a character
+  between 1 and 127 inclusive, regardless of any encoding
+  declaration [#encodings]_ in the source file.
+
+- The Unicode-specific escape sequences ``\u``*xxxx*,
+  ``\U``*xxxxxxxx*, and ``\N{``*name*``}`` are unrecognized in
+  Python 2.x and forbidden in Python 3000.
+
+Adjacent bytes literals are subject to the same concatenation rules as
+adjacent string literals. [#concat]_  A bytes literal adjacent to a
+string literal is an error.
+
+
+Semantics
+=========
+
+Each evaluation of a bytes literal produces a new ``bytes`` object.
+The bytes in the new object are the bytes represented by the
+``shortstringitem``s or ``longstringitem``s in the literal, in the
+same order.
+
+
+Rationale
+=========
+
+The proposed syntax provides a cleaner migration path from Python 2.x
+to Python 3000 for most code involving 8-bit strings.  Preserving the
+old 8-bit meaning of a string literal is usually as simple as adding a
+``b`` prefix.  The one exception is Python 2.x strings containing
+bytes >127, which must be rewritten using escape sequences.
+Transcoding a source file from one encoding to another, and fixing up
+the encoding declaration, should preserve the meaning of the program.
+Python 2.x non-Unicode strings violate this principle; Python 3000
+bytes literals shouldn't.
+
+A string literal with a ``b`` in the prefix is always a syntax error
+in Python 2.5, so this syntax can be introduced in Python 2.6, along
+with the ``bytes`` type.
+
+A bytes literal produces a new object each time it is evaluated, like
+list displays and unlike string literals.  This is necessary because
+bytes literals, like lists and unlike strings, are
+mutable. [#eachnew]_
+
+
+Reference Implementation
+========================
+
+Thomas Wouters has checked an implementation into the Py3K branch,
+r53872.
+
+
+References
+==========
+
+.. [#stringliterals]
+   http://www.python.org/doc/current/ref/strings.html
+
+.. [#encodings]
+   http://www.python.org/doc/current/ref/encodings.html
+
+.. [#concat]
+   http://www.python.org/doc/current/ref/string-catenation.html
+
+.. [#eachnew]
+   http://mail.python.org/pipermail/python-3000/2007-February/005779.html
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.