[Python-checkins] r63904 - peps/trunk/pep-3138.txt
guido.van.rossum
python-checkins at python.org
Tue Jun 3 00:26:29 CEST 2008
Author: guido.van.rossum
Date: Tue Jun 3 00:26:21 2008
New Revision: 63904
Log:
Fix lay-out glitches and remove gmail turd.
Modified:
peps/trunk/pep-3138.txt
Modified: peps/trunk/pep-3138.txt
==============================================================================
--- peps/trunk/pep-3138.txt (original)
+++ peps/trunk/pep-3138.txt Tue Jun 3 00:26:21 2008
@@ -29,20 +29,20 @@
- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
- characters(>=0x80) to '\\xXX'.
+ characters(>=0x80) to '\\xXX'.
- Backslash-escape quote characters (apostrophe, ') and add the quote
- character at the beginning and the end.
+ character at the beginning and the end.
For Unicode strings, the following additional conversions are done.
- Convert leading surrogate pair characters without trailing character
- (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
+ (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
- Convert 16-bit characters(>=0x100) to '\\uXXXX'.
- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
- '\\U00xxxxxx'.
+ '\\U00xxxxxx'.
This algorithm converts any string to printable ASCII, and repr() is
used as a handy and safe way to print strings for debugging or for
@@ -75,19 +75,19 @@
=============
- Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
- (Py_UNICODE ch)``. This function returns 0 if repr() should escape the
- Unicode character ``ch``; otherwise it returns 1. Characters that should
- be escaped are defined in the Unicode character database as:
-
- * Cc (Other, Control)
- * Cf (Other, Format)
- * Cs (Other, Surrogate)
- * Co (Other, Private Use)
- * Cn (Other, Not Assigned)
- * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
- * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
- * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
- this category should be escaped to avoid ambiguity.
+ (Py_UNICODE ch)``. This function returns 0 if repr() should escape the
+ Unicode character ``ch``; otherwise it returns 1. Characters that should
+ be escaped are defined in the Unicode character database as:
+
+ * Cc (Other, Control)
+ * Cf (Other, Format)
+ * Cs (Other, Surrogate)
+ * Co (Other, Private Use)
+ * Cn (Other, Not Assigned)
+ * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
+ * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
+ * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
+ this category should be escaped to avoid ambiguity.
- The algorithm to build repr() strings should be changed to:
@@ -105,22 +105,22 @@
character at the beginning and the end.
- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
- default.
+ default.
- Add ``'%a'`` string format operator. ``'%a'`` converts any python
- object to a string using repr() and then hex-escapes all non-ASCII
- characters. The ``'%a'`` format operator generates the same string as
- ``'%r'`` in Python 2.
+ object to a string using repr() and then hex-escapes all non-ASCII
+ characters. The ``'%a'`` format operator generates the same string as
+ ``'%r'`` in Python 2.
- Add a new built-in function, ``ascii()``. This function converts any
- python object to a string using repr() and then hex-escapes all non-
- ASCII characters. ``ascii()`` generates the same string as ``repr()``
- in Python 2.
+ python object to a string using repr() and then hex-escapes all non-
+ ASCII characters. ``ascii()`` generates the same string as ``repr()``
+ in Python 2.
- Add an ``isprintable()`` method to the string type. ``str.isprintable()``
- returns False if repr() should escape any character in the string;
- otherwise returns True. The ``isprintable()`` method calls the
- `` Py_UNICODE_ISPRINTABLE()`` function internally.
+ returns False if repr() should escape any character in the string;
+ otherwise returns True. The ``isprintable()`` method calls the
+ `` Py_UNICODE_ISPRINTABLE()`` function internally.
Rationale
@@ -157,38 +157,38 @@
- Supply a tool to print lists or dicts.
- Strings to be printed for debugging are not only contained by lists or
- dicts, but also in many other types of object. File objects contain a
- file name in Unicode, exception objects contain a message in Unicode,
- etc. These strings should be printed in readable form when repr()ed.
- It is unlikely to be possible to implement a tool to print all
- possible object types.
+ Strings to be printed for debugging are not only contained by lists or
+ dicts, but also in many other types of object. File objects contain a
+ file name in Unicode, exception objects contain a message in Unicode,
+ etc. These strings should be printed in readable form when repr()ed.
+ It is unlikely to be possible to implement a tool to print all
+ possible object types.
- Use sys.displayhook and sys.excepthook.
- For interactive sessions, we can write hooks to restore hex escaped
- characters to the original characters. But these hooks are called only
- when printing the result of evaluating an expression entered in an
- interactive Python session, and doesn't work for the print() function,
- for non-interactive sessions or for logging.debug("%r", ...), etc.
+ For interactive sessions, we can write hooks to restore hex escaped
+ characters to the original characters. But these hooks are called only
+ when printing the result of evaluating an expression entered in an
+ interactive Python session, and doesn't work for the print() function,
+ for non-interactive sessions or for logging.debug("%r", ...), etc.
- Subclass sys.stdout and sys.stderr.
- It is difficult to implement a subclass to restore hex-escaped
- characters since there isn't enough information left by the time it's
- a string to undo the escaping correctly in all cases. For example, ``
- print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
- there is no chance to tell file objects apart.
+ It is difficult to implement a subclass to restore hex-escaped
+ characters since there isn't enough information left by the time it's
+ a string to undo the escaping correctly in all cases. For example, ``
+ print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
+ there is no chance to tell file objects apart.
- Make the encoding used by unicode_repr() adjustable, and make the
- existing repr() the default.
+ existing repr() the default.
- With adjustable repr(), the result of using repr() is unpredictable
- and would make it impossible to write correct code involving repr().
- And if current repr() is the default, then the old convention remains
- intact and users may expect ASCII strings as the result of repr().
- Third party applications or libraries could be confused when a custom
- repr() function is used.
+ With adjustable repr(), the result of using repr() is unpredictable
+ and would make it impossible to write correct code involving repr().
+ And if current repr() is the default, then the old convention remains
+ intact and users may expect ASCII strings as the result of repr().
+ Third party applications or libraries could be confused when a custom
+ repr() function is used.
Backwards Compatibility
@@ -234,37 +234,36 @@
===========
- Is the ``ascii()`` function necessary, or is it sufficient to document
- how to do it? If necessary, should ``ascii()`` belong to the builtin
- namespace?
+ how to do it? If necessary, should ``ascii()`` belong to the builtin
+ namespace?
Rejected Proposals
==================
- Add encoding and errors arguments to the builtin print() function,
- with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
+ with defaults of sys.getfilesystemencoding() and 'backslashreplace'.
- Complicated to implement, and in general, this is not seen as a good
- idea. [2]_
+ Complicated to implement, and in general, this is not seen as a good
+ idea. [2]_
- Use character names to escape characters, instead of hex character
- codes. For example, ``repr('\u03b1')`` can be converted to
- ``"\N{GREEK SMALL LETTER ALPHA}"``.
+ codes. For example, ``repr('\u03b1')`` can be converted to
+ ``"\N{GREEK SMALL LETTER ALPHA}"``.
- Using character names can be very verbose compared to hex-escape.
- e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
- KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
+ Using character names can be very verbose compared to hex-escape.
+ e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
+ KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
- Default error-handler of sys.stdout should be 'backslashreplace'.
- Stuff written to stdout might be consumed by another program that
- might misinterpret the \ escapes. For interactive session, it is
- possible to make 'backslashreplace' error-handler to default, but may
- add confusion of the kind "it works in interactive mode but not when
- redirecting to a file".
+ Stuff written to stdout might be consumed by another program that
+ might misinterpret the \ escapes. For interactive session, it is
+ possible to make 'backslashreplace' error-handler to default, but may
+ add confusion of the kind "it works in interactive mode but not when
+ redirecting to a file".
-- Hide quoted text -
Reference Implementation
========================
More information about the Python-checkins
mailing list