Grapheme clusters, a.k.a.real characters

Chris Angelico rosuav at gmail.com
Mon Jul 17 12:04:04 EDT 2017


On Tue, Jul 18, 2017 at 1:36 AM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> On Mon, 17 Jul 2017 02:10 pm, Rustom Mody wrote:
>> Hint1: Ask your grandmother whether unicode's notion of character makes sense.
>
> What on earth makes you think that my grandmother is a valid judge of whether
> Unicode makes sense or not?
>
> She made some mighty fine chicken soup, and her coffee scroll cake was to die
> for, but I wouldn't want to ask her to fix my car, perform brain surgery, solve
> a differential equation, or judge the merits of a technical standard like
> Unicode.
>
> Her English wasn't that great, her Russian was more of a country-bumpkin dialect
> than Standard Russian, and it was mixed in with a lot of Estonian and Polish as
> well, and she had *absolutely zero* knowledge of different language systems
> like Chinese ideographs, Arabic, Hindi, etc. Nor did she know anything about
> the legacy encodings of the 1980s and 90s.
>
> How could she possibly be expected to judge Unicode? She never even handled a
> computer in her life, let alone program one. How could she judge the complex
> balancing act between competing requirements that go into Unicode?

I think the point here is not about judging Unicode, but defining a
character. If I were to ask either of my (late) grandmothers what a
character is, aside from being told that I am myself quite a
character, I'd probably get a reasonably sane response for text in
English, Italian, or Dutch. With the possible exception that "ij"
might be considered a single letter in Dutch. Except when it isn't.
But neither of them is qualified to say whether и and й are the same
letter or not, as both of them would think they were badly written
upper-case N. Nor would I ask either of them whether 다 is one
character or two. The "ask your grandmother" technique is great for
questions of UI within her area of skill, but that's about it.

ChrisA



More information about the Python-list mailing list