[Python-3000] UPDATED: PEP 3138- String representation in Python 3000

Tue May 27 03:06:55 CEST 2008

On 5/24/08, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
>  Specification
>  =============

It might help to call out which parts are changes.  If I understand
correctly, the only changes (as opposed to additions) are for
characters which are for characters which are (all three of)

(a)  outside of ASCII
(b)  not broken (that is, not half of a surrogate pair half)
(c)  not in the new excluded set.

>   * Characters defined in the Unicode character database as "Separator"
>     (Zl, Zp, Zs) other than ASCII space(0x20).

Please put in a note that  Zl and Zp refer only to two specific
unicode characters, not to what most people think of as line
separators or paragraph markers.

>   * Backslash-escape quote characters(apostrophe, ') and add quote
>     character at the beginning and the end.

Do you just mean the two ASCII quotation marks  that python uses?

As written, I wondered whether it would include backquote or guillemet.

>  - Add ``'%a'`` string format operator. ``'%a'`` converts any python
>   object to string using ``repr()`` and then hex-escape all non-ASCII
>   characters. ``'%a'`` operator generates same string as ``'%r'`` in
>   Python 2.

Then why not keep the old %r, and add a new one for the unicode repr?

Is it again because of the bug where str([..., mystr, ...])   ends up
doing repr on mystr?

>  - Add ``ascii()`` builtin function. ``ascii()`` converts any python
>   object to string using ``repr()`` and then hex-escape all non-ASCII
>   characters. ``ascii()`` generates same string as ``repr()`` in Python 2.

The problem isn't that I want to be able to write code that acts the
old way; the problem is that I want to ensure all code running on my
system acts the old way.

Adding an ascii() function doesn't help.

Keeping repr and adding full_repr would work (because I could look for
the new name).

Keeping repr and fixing the way it recurses when used as a str
fallback would be even better.

>   Strings to be printed for debugging are not only contained by lists or
>   dicts, but also in many other types of object. File objects contain a
>   file name in Unicode, exception objects contain a message in Unicode,
>   etc. These strings should be printed in readable form when repr()ed.
>   It is unlikely to be possible to implement a tool to print all
>   possible object types.

You could go a long way (particularly in Py3k, where everything
inherits from object) by changing the builtin containers, and changing
object.__str__ to try

     "<%s: %s>" % (type(v), iter(v))

before falling back to repr.  (You may wish something that looks for
mappings and sequences instead of any iterables.  You may wish to
change the exact look of the repr -- the point is just to tell the
contained objects to try str.)

>  - Make the encoding used by ``unicode_repr()`` adjustable, and make
>   current ``repr()`` as default.

>   With adjustable ``repr()``, result of ``repr()`` is unpredictable and
>   would make impossible to write correct code involving ``repr()``.

No more so than 3138.  The setting of repr is predictable on a given
system.  (Even if you make it a changeable during a single run, it is
predictable by checking first.)  Across systems, the 3138 proposal is
already unpredictable, because you don't know which systems will apply
backslash-replace on which characters (and on which runs).

-jJ