Grapheme clusters, a.k.a.real characters
Steven D'Aprano
steve at pearwood.info
Sun Jul 16 01:37:15 EDT 2017
On Sun, 16 Jul 2017 11:32:16 +1000, Chris Angelico wrote:
> On Sun, Jul 16, 2017 at 11:20 AM, Rick Johnson
> <rantingrickjohnson at gmail.com> wrote:
>> On Saturday, July 15, 2017 at 7:29:14 PM UTC-5, Chris Angelico wrote:
>>> [...] Also, that doesn't deal with U+200B or U+180E, which have
>>> well-defined widths *smaller* than typical Latin letters. (200B is a
>>> zero-width space. Is it a character?)
>>
>> Of *COURSE* it's a character.
>>
>> Would you also consider 0 not to be a number?
>>
>> Sheesh!
>
> Exactly. That's my point. Even in a monospaced font, U+200B is a
> character, yet it is by rule a zero-width character. So even in a
> monospaced font, some characters must vary in width.
In a *well-designed* *bug-free* monospaced font, all code points should
be either zero-width or one column wide. Or two columns, if the font
supports East Asian fullwidth characters.
In practice, no single font is going to cover the entire range of
Unicode. So one might hope for a *well-designed* *bug-free* FAMILY of
monospaced fonts which, between them, cover the entire range, and agree
on the width of a column.
But even in this best of all possible situations, you can't make everyone
happy, because there exist *thin spaces* which should render as a
fraction of the width of a regular space. But a monospaced font can't do
that: it either makes the thin space zero-width, or a full column.
Monospace is by its very nature a compromise on the "natural" width of
the characters. A sometimes *useful* compromise, but it cannot solve all
problems.
--
Steve
More information about the Python-list
mailing list