Grapheme clusters, a.k.a.real characters
Steven D'Aprano
steve at pearwood.info
Sun Jul 16 01:44:38 EDT 2017
On Sun, 16 Jul 2017 12:33:10 +1000, Ben Finney wrote:
> And yet the ASCII and Unicode standard says code point 0x0A (U+000A LINE
> FEED) is a character, by definition.
[...]
> > Is an acute accent a character?
>
> Yes, according to Unicode. ‘´’ (U+0301 ACUTE ACCENT) is a character.
Do you have references for those claims?
Because I'm pretty sure that Unicode is very, very careful to never use
the word "character" in a formal or normative manner, only as an informal
term for "the kinds of things that regular folk consider letters or
characters or similar".
And I don't think regular folks would know what a line feed was if it
jumped out of their computer and bit them :-) They would know what an
accent is, and I doubt they would consider an accent not on a base letter
to be a character. (I know I don't.)
--
Steve
More information about the Python-list
mailing list