Grapheme clusters, a.k.a.real characters

Steven D'Aprano steve at pearwood.info
Sun Jul 16 01:37:15 EDT 2017


On Sun, 16 Jul 2017 11:32:16 +1000, Chris Angelico wrote:

> On Sun, Jul 16, 2017 at 11:20 AM, Rick Johnson
> <rantingrickjohnson at gmail.com> wrote:
>> On Saturday, July 15, 2017 at 7:29:14 PM UTC-5, Chris Angelico wrote:
>>> [...] Also, that doesn't deal with U+200B or U+180E, which have
>>> well-defined widths *smaller* than typical Latin letters. (200B is a
>>> zero-width space. Is it a character?)
>>
>> Of *COURSE* it's a character.
>>
>> Would you also consider 0 not to be a number?
>>
>> Sheesh!
> 
> Exactly. That's my point. Even in a monospaced font, U+200B is a
> character, yet it is by rule a zero-width character. So even in a
> monospaced font, some characters must vary in width.

In a *well-designed* *bug-free* monospaced font, all code points should 
be either zero-width or one column wide. Or two columns, if the font 
supports East Asian fullwidth characters.

In practice, no single font is going to cover the entire range of 
Unicode. So one might hope for a *well-designed* *bug-free* FAMILY of 
monospaced fonts which, between them, cover the entire range, and agree 
on the width of a column.

But even in this best of all possible situations, you can't make everyone 
happy, because there exist *thin spaces* which should render as a 
fraction of the width of a regular space. But a monospaced font can't do 
that: it either makes the thin space zero-width, or a full column.

Monospace is by its very nature a compromise on the "natural" width of 
the characters. A sometimes *useful* compromise, but it cannot solve all 
problems.


-- 
Steve



More information about the Python-list mailing list