[Python-checkins] peps: Create PEP 460 "Add bytes % args and bytes.format(args) to Python 3.5"

victor.stinner python-checkins at python.org
Mon Jan 6 14:19:10 CET 2014


http://hg.python.org/peps/rev/7a92360bbdff
changeset:   5337:7a92360bbdff
user:        Victor Stinner <victor.stinner at gmail.com>
date:        Mon Jan 06 14:01:09 2014 +0100
summary:
  Create PEP 460 "Add bytes % args and bytes.format(args) to Python 3.5"

files:
  pep-0460.txt |  175 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 175 insertions(+), 0 deletions(-)


diff --git a/pep-0460.txt b/pep-0460.txt
new file mode 100644
--- /dev/null
+++ b/pep-0460.txt
@@ -0,0 +1,175 @@
+PEP: 460
+Title: Add bytes % args and bytes.format(args) to Python 3.5
+Version: $Revision$
+Last-Modified: $Date$
+Author: Victor Stinner <victor.stinner at gmail.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 6-Jan-2014
+Python-Version: 3.5
+
+
+Abstract
+========
+
+Add ``bytes % args`` operator and ``bytes.format(args)`` method to
+Python 3.5.
+
+
+Rationale
+=========
+
+``bytes % args`` and ``bytes.format(args)`` have been removed in Python
+2. This operator and this method are requested by Mercurial and Twisted
+developers to ease porting their project on Python 3.
+
+Python 3 suggests to format text first and then encode to bytes. In
+some cases, it does not make sense because arguments are bytes strings.
+Typical usage is a network protocol which is binary, since data are
+send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP,
+POP, FTP are ASCII commands interspersed with binary data.
+
+Using multiple ``bytes + bytes`` instructions is inefficient because it
+requires temporary buffers and copies which are slow and waste memory.
+Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``.
+
+``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even
+before the first release of Python 3.0 (see issue #3982).
+
+``struct.pack()`` is incomplete. For example, a number cannot be
+formatted as decimal and it does not support padding bytes string.
+
+Mercurial 2.8 still supports Python 2.4.
+
+
+Needed and excluded features
+============================
+
+Needed features
+
+* Bytes strings: bytes, bytearray and memoryview types
+* Format integer numbers as decimal
+* Padding with spaces and null bytes
+* "%s" should use the buffer protocol, not str()
+
+The feature set is minimal to keep the implementation as simple as
+possible to limit the cost of the implementation. ``str % args`` and
+``str.format(args)`` are already complex and difficult to maintain, the
+code is heavily optimized.
+
+Excluded features:
+
+* no implicit conversion from Unicode to bytes (ex: encode to ASCII or
+  to Latin1)
+* Locale support (``{!n}`` format for numbers). Locales are related to
+  text and usually to an encoding.
+* ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}``
+  formats. ``repr()`` and ``ascii()`` are used to debug, the output is
+  displayed a terminal or a graphical widget. They are more related to
+  text.
+* Attribute access: ``{obj.attr}``
+* Indexing: ``{dict[key]}``
+* Features of struct.pack(). For example, format a number as 32 bit unsigned
+  integer in network endian. The ``struct.pack()`` can be used to prepare
+  arguments, the implementation should be kept simple.
+* Features of int.to_bytes().
+* Features of ctypes.
+* New format protocol like a new ``__bformat__()`` method. Since the
+* list of
+  supported types is short, there is no need to add a new protocol.
+  Other types must be explicitly casted.
+* Alternate format for integer. For example, ``'{|#x}'.format(0x123)``
+  to get ``0x123``. It is more related to debug, and the prefix can be
+  easily be written in the format string (ex: ``0x%x``).
+* Relation with format() and the __format__() protocol. bytes.format()
+  and str.format() are unrelated.
+
+Unknown:
+
+* Format integer to hexadecimal? ``%x`` and ``%X``
+* Format integer to octal? ``%o``
+* Format integer to binary? ``{!b}``
+* Alignment?
+* Truncating? Truncate or raise an error?
+* format keywords? ``b'{arg}'.format(arg=5)``
+* ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)``
+* Floating point number?
+* ``%i``, ``%u`` and ``%d`` formats for integer numbers?
+* Signed number? ``%+i`` and ``%-i``
+
+
+bytes % args
+============
+
+Formatters:
+
+* ``"%c"``: one byte
+* ``"%s"``: integer or bytes strings
+* ``"%20s"`` pads to 20 bytes with spaces (``b' '``)
+* ``"%020s"`` pads to 20 bytes with zeros (``b'0'``)
+* ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``)
+
+
+bytes.format(args)
+==================
+
+Formatters:
+
+* ``"{!c}"``: one byte
+* ``"{!s}"``: integer or bytes strings
+* ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``)
+* ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``)
+* ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``)
+
+
+Examples
+========
+
+* ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'``
+* ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'``
+* ``b'%c'`` % 88`` gives ``b'X``'
+* ``b'%%'`` gives ``b'%'``
+
+
+Criticisms
+==========
+
+* The development cost and maintenance cost.
+* In 3.3 encoding to ascii or latin1 is as fast as memcpy
+* Developers must work around the lack of bytes%args and
+  bytes.format(args) anyway to support Python 3.0-3.4
+* bytes.join() is consistently faster than format to join bytes strings.
+* Formatting functions can be implemented in a third party module
+
+
+References
+==========
+
+* `Issue #3982: support .format for bytes
+  <http://bugs.python.org/issue3982>`_
+* `Mercurial project
+  <http://mercurial.selenic.com/>`_
+* `Twisted project
+  <http://twistedmatrix.com/trac/>`_
+* `Documentation of Python 2 formatting (str % args)
+  <http://docs.python.org/2/library/stdtypes.html#string-formatting>`_
+* `Documentation of Python 2 formatting (str.format)
+  <http://docs.python.org/2/library/string.html#formatstrings>`_
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+

+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:
+

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list