Grapheme clusters, a.k.a.real characters

Steven D'Aprano steve at pearwood.info
Sun Jul 16 01:44:38 EDT 2017


On Sun, 16 Jul 2017 12:33:10 +1000, Ben Finney wrote:

> And yet the ASCII and Unicode standard says code point 0x0A (U+000A LINE
> FEED) is a character, by definition.
[...]
> > Is an acute accent a character?
> 
> Yes, according to Unicode. ‘´’ (U+0301 ACUTE ACCENT) is a character.


Do you have references for those claims?

Because I'm pretty sure that Unicode is very, very careful to never use 
the word "character" in a formal or normative manner, only as an informal 
term for "the kinds of things that regular folk consider letters or 
characters or similar".

And I don't think regular folks would know what a line feed was if it 
jumped out of their computer and bit them :-) They would know what an 
accent is, and I doubt they would consider an accent not on a base letter 
to be a character. (I know I don't.)


-- 
Steve



More information about the Python-list mailing list