Grapheme clusters, a.k.a.real characters

Marko Rauhamaa marko at pacujo.net
Mon Jul 17 03:09:02 EDT 2017


Mikhail V <mikhailwas at gmail.com>:

>>> On Sat, 15 Jul 2017 05:50 pm, Marko Rauhamaa wrote:
>>It's true that confusion is caused by the ambiguity of the term
>>"character."
>
> Yes, but you have said "I might want random access to the "Grapheme clusters,
> a.k.a. real characters" and I had impression that you have some concrete
> concept of grapheme clusters and some (generally useful) example of
> implementation.
> Without concrete examples it is just juggling with the terms.

What did you think of my concrete examples, then? (Say, finding
"Alvárez" with the regular expression "Alv[aá]rez".)

> For example, I want to type in cyrillic " рекá " (with an acute accent
> to denote the stress on the last vowel, say for a pronunciation
> tutorial). Most frequent solution to it would be just typing á instead
> of a. And it is indeed most pratical: if I use modifier acute accent
> character instead, then it will be hard to select/paste such text and
> it will not render accurately.

Thing is, neither you (the user) nor you (the Python programmer) gets to
decide how "á" is represented in Unicode. That decision may be made by
other programmers (the terminal emulator, the file system or the text
editor). Still, everything should be transparent to both you (the user)
and you (the Python programmer).


Marko



More information about the Python-list mailing list