Grapheme clusters, a.k.a.real characters

Rustom Mody rustompmody at gmail.com
Tue Jul 18 22:19:19 EDT 2017


On Wednesday, July 19, 2017 at 3:00:21 AM UTC+5:30, Marko Rauhamaa wrote:
> Chris Angelico :
> 
> > Let me give you one concrete example: the letter "ö". In English, it
> > is (very occasionally) used to indicate diaeresis, where a pair of
> > letters is not a double letter - for example, "coöperate". (You can
> > also hyphenate, "co-operate".) In German, it is the letter "o" with a
> > pronunciation mark (umlaut), and is considered the same letter as "o".
> > In Swedish, it is a distinct letter, alphabetized last (following z,
> > å, and ä, in that order). But in all these languages, it's represented
> > the exact same way.
> 
> The German Wikipedia entry on "ä" calls "ä" a letter ("Buchstabe"):
> 
>    Der Buchstabe Ä (kleingeschrieben ä) ist ein Buchstabe des
>    lateinischen Schriftsystems.
> 
> Furthermore, it makes a distinction between "ä" the letter and "ä" the
> "a with a diaeresis:"
> 
>    In guten Druckschriften unterscheiden sich die Umlautpunkte von den
>    zwei Punkten des Tremas: Die Umlautpunkte sind kleiner, stehen näher
>    zusammen und liegen etwas tiefer.
> 
>    In good fonts umlaut dots are different from the two dots of a
>    diaeresis: the umlaut dots are smaller and closer to each other and
>    lie a little lower. [translation mine]
> 

Very interesting!
And may I take it that the two different variants — u-umlaut and u-diaresis — of ü are not (yet) given a seat in unicode?

Now compare with:
- hyphen-minus 0x2D
− minus sign 0x2212
‐ hyphen 0x2010
– en dash 0x2013
— em dash 0x2014
― horizontal bar 0x2015
… And perhaps another half-dozen



More information about the Python-list mailing list