Grapheme clusters, a.k.a.real characters

Rick Johnson rantingrickjohnson at gmail.com
Sun Jul 16 00:52:41 EDT 2017


On Saturday, July 15, 2017 at 9:33:49 PM UTC-5, Ben Finney wrote:
> MRAB <python at mrabarnett.plus.com> writes:

[...]
    
> > Is linefeed a character? You might call it a "control
> > character", but it's not really a _character_, it's
> > control/format _code_.
> 
> And yet the ASCII and Unicode standard says code point 0x0A
> (U+000A LINE FEED) is a character, by definition.  Rather
> than saying “no, it's not a character”, I think a more
> accurate statement would be: a linefeed *is* a character in
> ASCII, but that doesn't mean every other standard must
> agree.  Indeed it may be better to say: a line feed is a
> character and is also a control code.
> 
> > Is an acute accent a character?
> 
> Yes, according to Unicode. ‘´’ (U+0301 ACUTE ACCENT) is a
> character.
> 
> > No, it's a diacritic mark that's added to a character.
> 
> Lose the “no”, and I agree.

So you would be happy with a string containing a single
character that was _decorated_ with a single accent mark
(say, for instance U+00E3 (Latin Small Letter A with
tilde), to return a length value of 2? Really?

> It's entirely reasonable for a concept to fit in multiple
> categories simultaneously.

Reasonable? Perhaps...

Practical? No way!




More information about the Python-list mailing list