Grapheme clusters, a.k.a.real characters

Steve D'Aprano steve+python at pearwood.info
Wed Jul 19 21:30:43 EDT 2017


On Thu, 20 Jul 2017 01:30 am, Random832 wrote:

> On Tue, Jul 18, 2017, at 22:49, Steve D'Aprano wrote:
>> > What about Emoji?
>> > U+1F469 WOMAN is two columns wide on its own.
>> > U+1F4BB PERSONAL COMPUTER is two columns wide on its own.
>> > U+200D ZERO WIDTH JOINER is zero columns wide on its own.
>> 
>> 
>> What about them? In a monospaced font, they should follow the same rules
>> I used
>> above: either 0, 1 or 2 column wide.
> 
> You snipped the important part - the fact that the whole sequence of
> three code points U+1F469 U+200D U+1F4BB is a single grapheme cluster
> two columns wide.

There's no requirement for rendering engines to display the emoji sequence in
any specific way. Maybe we would like the combined emoji to display in two
columns, but that's not guaranteed, nor is it required by the standard.

http://unicode.org/emoji/charts/emoji-zwj-sequences.html

If the renderer cannot display a "Woman Personal Computer" as a single emoji, it
is permissible to fall back to two glyphs.


> You also ignored all of the other examples in my post. Did you even read
> anything beyond what you snipped?

Yes I did, but I didn't understand it. Maybe that was because I didn't read your
post carefully enough, or maybe it was because you didn't explain what point
you were making carefully enough. Or a little of both.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list