[Python-checkins] r42556 - peps/trunk/pep-0358.txt
neil.schemenauer
python-checkins at python.org
Wed Feb 22 21:49:38 CET 2006
Author: neil.schemenauer
Date: Wed Feb 22 21:49:37 2006
New Revision: 42556
Modified:
peps/trunk/pep-0358.txt
Log:
Reformat.
Modified: peps/trunk/pep-0358.txt
==============================================================================
--- peps/trunk/pep-0358.txt (original)
+++ peps/trunk/pep-0358.txt Wed Feb 22 21:49:37 2006
@@ -12,197 +12,195 @@
Abstract
-========
-This PEP outlines the introduction of a raw bytes sequence object.
-Adding the bytes object is one step in the transition to Unicode based
-str objects.
+ This PEP outlines the introduction of a raw bytes sequence object.
+ Adding the bytes object is one step in the transition to Unicode
+ based str objects.
Motivation
-==========
-Python's current string objects are overloaded. They serve to hold
-both sequences of characters and sequences of bytes. This overloading
-of purpose leads to confusion and bugs. In future versions of Python,
-string objects will be used for holding character data. The bytes object
-will fulfil the role of a byte container. Eventually the unicode
-built-in will be renamed to str and the str object will be removed.
+ Python's current string objects are overloaded. They serve to hold
+ both sequences of characters and sequences of bytes. This
+ overloading of purpose leads to confusion and bugs. In future
+ versions of Python, string objects will be used for holding
+ character data. The bytes object will fulfil the role of a byte
+ container. Eventually the unicode built-in will be renamed to str
+ and the str object will be removed.
Specification
-=============
-A bytes object stores a mutable sequence of integers that are in the
-range 0 to 255. Unlike string objects, indexing a bytes object returns
-an integer. Assigning an element using a object that is not an integer
-causes a TypeError exception. Assigning an element to a value outside
-the range 0 to 255 causes a ValueError exception. The __len__ method of
-bytes returns the number of integers stored in the sequence (i.e. the
-number of bytes).
-
-The constructor of the bytes object has the following signature:
-
- bytes([initialiser[, [encoding]])
-
-If no arguments are provided then an object containing zero elements is
-created and returned. The initialiser argument can be a string or a
-sequence of integers. The pseudo-code for the constructor is:
-
- def bytes(initialiser=[], encoding=None):
- if isinstance(initialiser, basestring):
- if isinstance(initialiser, unicode):
- if encoding is None:
- encoding = sys.getdefaultencoding()
- initialiser = initialiser.encode(encoding)
- initialiser = [ord(c) for c in initialiser]
- elif encoding is not None:
- raise TypeError("explicit encoding invalid for non-string "
- "initialiser")
- create bytes object and fill with integers from initialiser
- return bytes object
-
-The __repr__ method returns a string that can be evaluated to generate a
-new bytes object containing the same sequence of integers. The sequence
-is represented by a list of ints. For example:
-
- >>> repr(bytes[10, 20, 30])
- 'bytes([10, 20, 30])'
-
-The object has a decode method equivalent to the decode method of the
-str object. The object has a classmethod fromhex that takes a string of
-characters from the set [0-9a-zA-Z ] and returns a bytes object (similar
-to binascii.unhexlify). For example:
-
- >>> bytes.fromhex('5c5350ff')
- bytes([92, 83, 80, 255]])
- >>> bytes.fromhex('5c 53 50 ff')
- bytes([92, 83, 80, 255]])
-
-The object has a hex method that does the reverse conversion (similar to
-binascii.hexlify):
-
- >> bytes([92, 83, 80, 255]]).hex()
- '5c5350ff'
-
-The bytes object has methods similar to the list object:
-
- __add__
- __contains__
- __delitem__
- __delslice__
- __eq__
- __ge__
- __getitem__
- __getslice__
- __gt__
- __hash__
- __iadd__
- __imul__
- __iter__
- __le__
- __len__
- __lt__
- __mul__
- __ne__
- __reduce__
- __reduce_ex__
- __repr__
- __rmul__
- __setitem__
- __setslice__
- append
- count
- extend
- index
- insert
- pop
- remove
+ A bytes object stores a mutable sequence of integers that are in the
+ range 0 to 255. Unlike string objects, indexing a bytes object
+ returns an integer. Assigning an element using a object that is not
+ an integer causes a TypeError exception. Assigning an element to a
+ value outside the range 0 to 255 causes a ValueError exception. The
+ __len__ method of bytes returns the number of integers stored in the
+ sequence (i.e. the number of bytes).
+
+ The constructor of the bytes object has the following signature:
+
+ bytes([initialiser[, [encoding]])
+
+ If no arguments are provided then an object containing zero elements
+ is created and returned. The initialiser argument can be a string
+ or a sequence of integers. The pseudo-code for the constructor is:
+
+ def bytes(initialiser=[], encoding=None):
+ if isinstance(initialiser, basestring):
+ if isinstance(initialiser, unicode):
+ if encoding is None:
+ encoding = sys.getdefaultencoding()
+ initialiser = initialiser.encode(encoding)
+ initialiser = [ord(c) for c in initialiser]
+ elif encoding is not None:
+ raise TypeError("explicit encoding invalid for non-string "
+ "initialiser")
+ create bytes object and fill with integers from initialiser
+ return bytes object
+
+ The __repr__ method returns a string that can be evaluated to
+ generate a new bytes object containing the same sequence of
+ integers. The sequence is represented by a list of ints. For
+ example:
+
+ >>> repr(bytes[10, 20, 30])
+ 'bytes([10, 20, 30])'
+
+ The object has a decode method equivalent to the decode method of
+ the str object. The object has a classmethod fromhex that takes a
+ string of characters from the set [0-9a-zA-Z ] and returns a bytes
+ object (similar to binascii.unhexlify). For example:
+
+ >>> bytes.fromhex('5c5350ff')
+ bytes([92, 83, 80, 255]])
+ >>> bytes.fromhex('5c 53 50 ff')
+ bytes([92, 83, 80, 255]])
+
+ The object has a hex method that does the reverse conversion
+ (similar to binascii.hexlify):
+
+ >> bytes([92, 83, 80, 255]]).hex()
+ '5c5350ff'
+
+ The bytes object has methods similar to the list object:
+
+ __add__
+ __contains__
+ __delitem__
+ __delslice__
+ __eq__
+ __ge__
+ __getitem__
+ __getslice__
+ __gt__
+ __hash__
+ __iadd__
+ __imul__
+ __iter__
+ __le__
+ __len__
+ __lt__
+ __mul__
+ __ne__
+ __reduce__
+ __reduce_ex__
+ __repr__
+ __rmul__
+ __setitem__
+ __setslice__
+ append
+ count
+ extend
+ index
+ insert
+ pop
+ remove
Out of scope issues
-===================
-* If we provide a literal syntax for bytes then it should look distinctly
- different than the syntax for literal strings. Also, a new type, even
- built-in, is much less drastic than a new literal (which requires
- lexer and parser support in addition to everything else). Since there
- appears to be no immediate need for a literal representation,
- designing and implementing one is out of the scope of this PEP.
-
-* Python 3k will have a much different I/O subsystem. Deciding how that
- I/O subsystem will work and interact with the bytes object is out of
- the scope of this PEP.
-
-* It has been suggested that a special method named __bytes__ be added
- to language to allow objects to be converted into byte arrays. This
- decision is out of scope.
+ * If we provide a literal syntax for bytes then it should look
+ distinctly different than the syntax for literal strings. Also, a
+ new type, even built-in, is much less drastic than a new literal
+ (which requires lexer and parser support in addition to everything
+ else). Since there appears to be no immediate need for a literal
+ representation, designing and implementing one is out of the scope
+ of this PEP.
+
+ * Python 3k will have a much different I/O subsystem. Deciding how
+ that I/O subsystem will work and interact with the bytes object is
+ out of the scope of this PEP.
+
+ * It has been suggested that a special method named __bytes__ be
+ added to language to allow objects to be converted into byte
+ arrays. This decision is out of scope.
Unresolved issues
-=================
-* Perhaps the bytes object should be implemented as a extension module
- until we are more sure of the design (similar to how the set object
- was prototyped).
+ * Perhaps the bytes object should be implemented as a extension
+ module until we are more sure of the design (similar to how the
+ set object was prototyped).
+
+ * Should the bytes object implement the buffer interface? Probably,
+ but we need to look into the implications of that (e.g. regex
+ operations on byte arrays).
-* Should the bytes object implement the buffer interface? Probably, but
- we need to look into the implications of that (e.g. regex operations
- on byte arrays).
+ * Should the object implement __reversed__ and reverse? Should it
+ implement sort?
-* Should the object implement __reversed__ and reverse? Should it
- implement sort?
-
-* Need to clarify what some of the methods do. How are comparisons
- done? Hashing? Pickling and marshalling?
+ * Need to clarify what some of the methods do. How are comparisons
+ done? Hashing? Pickling and marshalling?
Questions and answers
-=====================
-Q: Why have the optional encoding argument when the encode method of
- Unicode objects does the same thing.
+ Q: Why have the optional encoding argument when the encode method of
+ Unicode objects does the same thing.
-A: In the current version of Python, the encode method returns a str
- object and we cannot change that without breaking code. The construct
- bytes(s.encode(...)) is expensive because it has to copy the byte
- sequence multiple times. Also, Python generally provides two ways of
- converting an object of type A into an object of type B: ask an A
- instance to convert itself to a B, or ask the type B to create a new
- instance from an A. Depending on what A and B are, both APIs make
- sense; sometimes reasons of decoupling require that A can't know
- about B, in which case you have to use the latter approach; sometimes
- B can't know about A, in which case you have to use the former.
-
-
-Q: Why does bytes ignore the encoding argument if the initialiser is a
- str?
-
-A: There is no sane meaning that the encoding can have in that case.
- str objects *are* byte arrays and they know nothing about the
- encoding of character data they contain. We need to assume that the
- programmer has provided str object that already uses the desired
- encoding. If you need something other than a pure copy of the bytes
- then you need to first decode the string. For example:
-
- bytes(s.decode(encoding1), encoding2)
-
-
-Q: Why not have the encoding argument default to Latin-1 (or some other
- encoding that covers the entire byte range) rather than ASCII ?
-
-A: The system default encoding for Python is ASCII. It seems least
- confusing to use that default. Also, in Py3k, using Latin-1 as
- the default might not be what users expect. For example, they might
- prefer a Unicode encoding. Any default will not always work as
- expected. At least ASCII will complain loudly if you try to encode
- non-ASCII data.
+ A: In the current version of Python, the encode method returns a str
+ object and we cannot change that without breaking code. The
+ construct bytes(s.encode(...)) is expensive because it has to
+ copy the byte sequence multiple times. Also, Python generally
+ provides two ways of converting an object of type A into an
+ object of type B: ask an A instance to convert itself to a B, or
+ ask the type B to create a new instance from an A. Depending on
+ what A and B are, both APIs make sense; sometimes reasons of
+ decoupling require that A can't know about B, in which case you
+ have to use the latter approach; sometimes B can't know about A,
+ in which case you have to use the former.
+
+
+ Q: Why does bytes ignore the encoding argument if the initialiser is
+ a str?
+
+ A: There is no sane meaning that the encoding can have in that case.
+ str objects *are* byte arrays and they know nothing about the
+ encoding of character data they contain. We need to assume that
+ the programmer has provided str object that already uses the
+ desired encoding. If you need something other than a pure copy of
+ the bytes then you need to first decode the string. For example:
+
+ bytes(s.decode(encoding1), encoding2)
+
+
+ Q: Why not have the encoding argument default to Latin-1 (or some
+ other encoding that covers the entire byte range) rather than
+ ASCII?
+
+ A: The system default encoding for Python is ASCII. It seems least
+ confusing to use that default. Also, in Py3k, using Latin-1 as
+ the default might not be what users expect. For example, they
+ might prefer a Unicode encoding. Any default will not always
+ work as expected. At least ASCII will complain loudly if you try
+ to encode non-ASCII data.
Copyright
-=========
-This document has been placed in the public domain.
+ This document has been placed in the public domain.
More information about the Python-checkins
mailing list