[issue5127] Use Py_UCS4 instead of Py_UNICODE in unicodectype.c

Thu Jul 8 10:16:40 CEST 2010

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

[This should probably be discussed on python-dev or in another issue, so feel free to move the conversation there.]

The current implementation considers printable """all the characters except those characters defined in the Unicode character database as following categories are considered printable.
  * Cc (Other, Control)
  * Cf (Other, Format)
  * Cs (Other, Surrogate)
  * Co (Other, Private Use)
  * Cn (Other, Not Assigned)
  * Zl Separator, Line ('\u2028', LINE SEPARATOR)
  * Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
  * Zs (Separator, Space) other than ASCII space('\x20')."""

We could also arbitrary exclude all the non-BMP chars, but that shouldn't be based on the availability of the fonts IMHO.

> Note that Python3 will send printable code points as-is to the
> console, so whether or not a code point is considered printable
> should take the common availability of fonts being able to display
> the code point into account. Otherwise, a user would just see a
> square box instead of the much more useful escape sequence

If the concern is about the usefulness of repr() in the console, note that on the Windows terminal trying to display most of the characters results in an error (see #5110), and that makes repr() barely usable.
ascii() might be an alternative if the user wants to see the escape sequence instead of a square box.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5127>
_______________________________________