Grapheme clusters, a.k.a.real characters

Chris Angelico rosuav at gmail.com
Sun Jul 16 02:01:59 EDT 2017


On Sun, Jul 16, 2017 at 3:37 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, 16 Jul 2017 11:32:16 +1000, Chris Angelico wrote:
>
>> Exactly. That's my point. Even in a monospaced font, U+200B is a
>> character, yet it is by rule a zero-width character. So even in a
>> monospaced font, some characters must vary in width.
>
> In a *well-designed* *bug-free* monospaced font, all code points should
> be either zero-width or one column wide. Or two columns, if the font
> supports East Asian fullwidth characters.
>
> In practice, no single font is going to cover the entire range of
> Unicode. So one might hope for a *well-designed* *bug-free* FAMILY of
> monospaced fonts which, between them, cover the entire range, and agree
> on the width of a column.

Hmm, I'm not sure about that. A font can be monospaced for the most
part, yet respect multiple different "width groups" (eg East Asian
characters all get one width, while Latin-family characters all get a
different width). However, even in the idealized form you describe,
you still have to cope with zero-width characters (do they get zero or
do they get one column?), and characters that join together (Arabic
and Korean (Hangul)).

I think the Liberation Sans Mono font (family??) does a pretty good
job of making most text columnate well (for instance, the narrow
spaces (thin, half, third, etc) all expand to a full space), while not
getting too het up about everything being exactly the same number of
pixels. If monospacing is, as you say, a compromise, at least Lib Sans
Mono has picked a good compromise.

ChrisA



More information about the Python-list mailing list