Grapheme clusters, a.k.a.real characters

Steve D'Aprano steve+python at pearwood.info
Mon Jul 17 11:36:03 EDT 2017


On Mon, 17 Jul 2017 02:10 pm, Rustom Mody wrote:

>> Please don't feed the trolls.
> 
> Its usually called 'joke' Steven! Did the word fall out of your dictionary
> in the last upgrade?
> Rick was no more trolling than Marko 

Funny you say that. I often think Marko is trolling, but if he is, he does a
good job of leaving me in just enough doubt that I'm willing to continue the
discussion.

As for Rick, I can't tell if he's merely trolling to get a reaction, or he
really does believe the crap he spouts off in most of his posts. I'm not sure
which would be worse.


> or you or Chris or Mikhail or anyone else 
> If anyone's trolling its me…  len("á") == 1½ is so obviously nonsense on so
> many levels I did not think
> "And now ladies (are there any?) and gentlemen I am going to tell a joke!"
> would be necessary

And it wouldn't have been necessary, if we didn't have Ranting Rick here to take
your proposal seriously.


> On a more serious note every other post on this (as on many discussing unicode
> more broadly) is so ridiculously Euro (or Anglo) centric I would not know
> where to begin.

I'm always willing to learn. How am I Euro, or Anglo, centric?



> Witness your own…
[...]
> You've given 4 ifs.

Actually I gave five "ifs", plus one other conditional phrase which could have
been re-worded as an "if".


> An L-language may would assume that the atomic units of language-L would
> be supported.  Your 4th if suggests thats ok. Is it?

Please pardon me for being Anglo-centric, but what's an L-language?

People make lots of bad assumptions. For example, they assume that computer
arithmetic must follow the same mathematical rules of associativity,
commutativity and distributivity that they learned about the Real number system
in high school. That assumption is wrong.

People assume that the atomic units of language are a simple thing to define,
and having defined them, support them in programming languages. That assumption
is also wrong.

People assume all sorts of falsehoods about programming, and language. So to
answer your question, no, it is not okay to assume that the "atomic units of
language" (whatever they are) are supported.

I don't think that it is even a given that "atomic units of language" exist. To
quote a Hindi speaker earlier in this thread, की is a letter, and yet it can be
decomposed into की = क + ई, so it isn't "atomic". If letters aren't atomic,
then what are?

So if the "atomic units of language" (letters?) have "subatomic parts", where
does that leave us programmers? Shouldn't we be able to manipulate text at the
subatomic level?


> Hint1: Ask your grandmother whether unicode's notion of character makes sense.

What on earth makes you think that my grandmother is a valid judge of whether
Unicode makes sense or not?

She made some mighty fine chicken soup, and her coffee scroll cake was to die
for, but I wouldn't want to ask her to fix my car, perform brain surgery, solve
a differential equation, or judge the merits of a technical standard like
Unicode.

Her English wasn't that great, her Russian was more of a country-bumpkin dialect
than Standard Russian, and it was mixed in with a lot of Estonian and Polish as
well, and she had *absolutely zero* knowledge of different language systems
like Chinese ideographs, Arabic, Hindi, etc. Nor did she know anything about
the legacy encodings of the 1980s and 90s.

How could she possibly be expected to judge Unicode? She never even handled a
computer in her life, let alone program one. How could she judge the complex
balancing act between competing requirements that go into Unicode?

Its really sad to see somebody who I thought was educated exposing the view that
knowledge and education aren't needed to judge complex technical questions,
only common sense[1]. Experts? Who needs 'em?


> Ask 10 gmas from 10 language-L's
> Hint2: When in doubt gma usually is right

Would you let your grandmother perform brain surgery on someone you cared for?

Well, maybe, if she actually was a brain surgeon. But if not?




[1] http://i.imgur.com/jgmwz1q.jpg

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list