Glyphs and graphemes [was Re: Cult-like behaviour]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Jul 16 21:08:02 EDT 2018


On Tue, 17 Jul 2018 06:15:25 +1000, Chris Angelico wrote:

> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> There is nothing special about diacritics such that we ought to treat
>> some combinations like "Ch" (two code points = one character) as "fixed
>> width" while others like "â" (two code points = one character) as
>> "variable width".
> 
> When you reverse a word, do you treat "ch" and "sh" as one character or
> two? 

In English, "ch" is always two letters of the alphabet. In Welsh and 
Czech, they can be one or two letters. (I think they will be two letters 
only in loan words, but I'm not certain about that.) Whether that makes 
them one or two characters depends on how you define "character".

Good luck with finding a universal, objective, unambiguous definition.


> I'm of the opinion that they're single characters, and thus this
> should be "dalokosh":
> 
> https://wiki.teamfortress.com/wiki/Dalokohs_Bar
> 
> (It's the Russian for "chocolate" - "шоколад" - transliterated to
> English/Latin - "šokolad" or "shokolad" - and then reversed.)

In English, I think most people would prefer to use a different term for 
whatever "sh" and "ch" represent than "character". But you make a good 
point that even in English, we sometimes want to treat two letter 
combinations as a single unit.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson




More information about the Python-list mailing list