[Python-3000] Displaying strings containing unicode escapes

"Martin v. Löwis" martin at v.loewis.de
Thu May 1 19:12:07 CEST 2008


> I still like this proposal. I don't quite understand the competing (?)
> proposal by Stephen Turnbull; perhaps Stephen can compare and contrast
> the two proposals? And where does Atsuo fall?

IIUC, Stephen proposes to use some of the "security" algorithms for
display, without (yet) specifying which one specifically.

I don't think they apply, as these algorithms are designed for
identifiers (in particular for use in programming languages and
domain names); any character classified as "confusing" would get
escaped. As Stephen elaborates, that would have the undesirable
side effect of escaping the Cyrillic A (i.e. А), likewise for
some Greek letters. In any case, one would have to write a
precise specification first (UTR#36/#39 leave options), and probably
extend the tables in unicodedata.

Atsuo's latest proposal (http://wiki.python.org/moin/Python3kStringRepr)
is an elaboration of mine, I think. I would have phrased it slightly
differently, i.e.

- escaped are all Z* and C* characters, plus backslash, except space.
  In UCS-2 builds, half surrogates get escaped only if they don't occur
  as a pair.
- escaping looks like this:
  * \r, \n, \t, \\
  * \xXX for characters from Latin-1
  * \uXXXX for characters from the BMP
  * \U00XXXXXX for anything else

What I didn't have in my original proposal was escaping of Zs
except for space, which then would also escape NBSP, EN QUAD,
EM QUAD, THIN SPACE, HAIR SPACE, OGHAM SPACE MARK, etc. Escaping
them is fine also. Also, I didn't consider surrogate pairs in
UCS-2 builds originally; they should (of course) get represented
as-is.

The issue then is output of repr to a device, which may go wrong
in two ways:
- the device claims it supports the character, but doesn't actually
  have a glyph for it. In that case, the terminal encoding should
  be adjusted.
- the device cannot display certain characters in the repr. Here,
  an escaping error handler can be used if desired.

Regards,
Martin



More information about the Python-3000 mailing list