[issue2630] repr() should not escape non-ASCII characters

Wed Apr 16 21:37:39 CEST 2008

Marc-Andre Lemburg <mal at egenix.com> added the comment:

While it may be desirable to to have repr(unicode) return a non-ASCII
string, the suggested approach is not suitable to solve the problem.

repr() is usually used in logging and applications/users/tools don't
expect to suddenly find non-ASCII or even mixed encodings in a log file.

If you do want to have this more flexible, then make the encoding used
by unicode_repr() adjustable, turn the existing code into a codec (e.g.
"unicode-repr") and leave it setup as default.

Users who wish to see non-ASCII repr(unicode) data can then adjust the
used encoding to their liking.

This is both more flexible and backwards compatible with 2.x.

Also note that the separation of the Unicode database from the
interpreter core was done to keep the interpreter footprint manageable.
It's not a good idea to just dump the complete table set into
unicodeobject.c via an #include. If you need to reference APIs from
modules in C, the usual approach is to create a PyCObject which is then
exported by the module (see e.g. the datetime module) and imported by
code needing it.

BTW: "printable" is not a defined term in Unicode. What is or is not
printable really depends on the use case, e.g. there are quite a few
code points in Unicode that don't result in any glyph being "printed" to
the screen. A Unicode string could then look as if it had fewer code
points than it actually does - which is not really what you want when
debugging code or sifting through log files.

----------
nosy: +lemburg

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2630>
__________________________________