Grapheme clusters, a.k.a.real characters

Random832 random832 at fastmail.com
Tue Jul 18 10:29:30 EDT 2017


On Sun, Jul 16, 2017, at 01:37, Steven D'Aprano wrote:
> In a *well-designed* *bug-free* monospaced font, all code points should 
> be either zero-width or one column wide. Or two columns, if the font 
> supports East Asian fullwidth characters.

What about Emoji?
U+1F469 WOMAN is two columns wide on its own.
U+1F4BB PERSONAL COMPUTER is two columns wide on its own.
U+200D ZERO WIDTH JOINER is zero columns wide on its own.

The sequence U+1F469 U+200D U+1F4BB is the single emoji "Woman
Technologist", which is two columns wide.

Even without ZWJ this comes up - the regional indicator characters are
meant to appear in pairs - signifying a flag, which is two columns wide
- but when they appear in isolation they usually appear as an equally
wide "letter in a box" picture.

The skin tone indicators aren't applied with ZWJ, and are meant to
combine with the preceding character when it is an emoji depicting a
person, but show up as a square swatch of that color in isolation. And
AIUI they don't have a combining class in the unicode data.

Or, consider presentation variation selectors

U+26A1 HIGH VOLTAGE SIGN
U+FE0E VARIATION SELECTOR-15 (text presentation in this context)
U+FE0F VARIATION SELECTOR-16 (emoji presentation in this context)

Some code points are meant to be shown as a text character in some
contexts and an emoji in others. The default presentation (when not
followed by a variation selector) depends on the application. Otherwise,
the Emoji is two columns wide and the text presentation version is
usually one column wide.

The variation selectors themselves are zero columns wide when applied to
any character for which it is not meant to be applied.

(From a font perspective these can be regarded as ligatures, but the
font itself is not responsible for the behavior of a character-cell
terminal emulator)



More information about the Python-list mailing list