[Python-3000] PEP 3138- String representation in Python 3000

Thu May 15 18:13:01 CEST 2008

On Thu, May 15, 2008 at 1:18 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> Atuso
>
> you are not really addressing my arguments in your reply.
>
> My main concern is that repr(unicode) as well as '%r' is used
> a lot in logging and debugging of applications.
>
> In the 2.x series of Python, the output of repr() has traditionally
> always been plain ASCII and does not require any special encoding
> and also doesn't run into problems when mixing the output with
> other encodings used in the log file, on the console or whereever
> the output of repr() is sent.
>
> You are now suggesting to break this convention by allowing
> all printable code points to be used in the repr() output.
> Depending on where you send the repr() output and the contents
> of the PyUnicode object, this will likely result in exceptions
> in the .write() method of the stream object.
>

I can't understand why Python 3000 should stick to ASCII repr(). If
your concern is about output, it should be addressed by file object on
printing. The repr() generates text information about an object, and
file encode the text for user's environment on output. This is
straight forward, flexible and common pattern for the Unicode
applications.

> Just adjusting sys.stdout and sys.stderr to prevent them from
> falling over is not enough (and is indeed not within the scope
> of the PEP, since those changes are *major* and not warranted
> for just getting your Unicode repr() to work). repr() is very
> often written to log files and those would all have to be
> changed as well.
>

For other files than sys.std*, I see no problem with::

log = open(filename, errors='backslashreplace').
log.write("%r" % obj)

Although I prefer to 'backslashreplace' as default value for errors.

>  - Are there alternative ways to get the "problem" fixed ?
>  - Is the added convenience worth breaking existing conventions ?

I would like to call it "improve", not break :)

>  - Is it worth breaking existing applications ?

I guess number of applications broken by this change would be small,
and fix would be easy.
So I think worth it, and perhaps a lot of programmers in the non-Latin
countries might think so, too. Apparently, this PEP brought you
concern without any benefit. But this PEP is necessary to make the
most of Unicode's ability for debugging and logging.

>
> I've suggested making the repr() output configurable to address
> the convenience aspect of your proposal. You could then set the
> output encoding to e.g. "unicode-printable" and get your preferred
> output. The default could remain set to the current all-ASCII output.
>

I'm sorry, I cannot understand what "unicode-printable" codec does.
Could you please explain it?

I don't like to make repr() adjustable(I presume you mean to make
unicode_repr() in the Modules/unicodeobject.c adjustable), because old
repr() convention remains intact. Third party applications or
libraries could be failed when I use my custom repr() function.