Grapheme clusters, a.k.a.real characters

Rick Johnson rantingrickjohnson at gmail.com
Sat Jul 15 10:31:35 EDT 2017


On Friday, July 14, 2017 at 12:43:50 PM UTC-5, Steve D'Aprano wrote:
> Before you answer, does your answer apply to Arabic and
> Thai as well as Western European languages?

I find it interesting that those who bellyache the loudest
about the "inclusivity of regional charator encodings" never
dabble much outside their _own_ basic English set. For
instance: I never hear Chinese or eastern Europeans
bellyaching about how ASCII forced them to use a standard
keyboard and denied them the "gawd given right" to become an
amatuer space cadet[1]! Nope, they just learn English and move
on.

> [...]
>
> As for the legacy encodings:
> 
> - they're not 7-bit clean, except for ASCII;
> 
> - some of them are variable-width;
> 
> - none of them support the full range of Unicode, so they
> aren't universal character sets;
> 
> - in other words, you either resign yourself to being
> unable to exchange documents with other people, resign
> yourself to dealing with moji-bake, or invent some complex
> and non-backwards-compatible in-band mechanism for
> switching charsets;
> 
> - they suffer from the exact same problems as Unicode
> regarding the distinction between code points and
> graphemes;
> 
> - so not only do they lack the advantages of Unicode, but
> they have even more disadvantages.

Thanks for finally admitting that Unicode is not the cure
all that you unicode cultist make it out to be.


[1] Possibly with the exception of Xan Lee. ;-). BTW, what
happened to the old chap?



More information about the Python-list mailing list