Grapheme clusters, a.k.a.real characters

Rustom Mody rustompmody at gmail.com
Sun Jul 16 03:07:07 EDT 2017


The first book I studied as a CS-student was Structured Computer Organization by Tanenbaum

Apart from the detailed description of various machines like PDP-11, IBM-360
etc it suggested the understanding of the computer at 4 levels:
- Microprogramming level
- "Conventional" machine level (nowadays called ISA)
- OS level -- where system calls become new "instructions"
- HLL level of languages (like PL-1 !)

[The next edition would add the digital abstraction level below the microprogamming level]

For me as for many in my generation this book and this leveled view
was an important component in my understanding of CS

A few years later I studied a course on something called "networks and networking"
Again it talked of some 7 (OSI) layers
But it didnt make much sense to someone whose only idea of a network was the
wire that connected the terminal to the (pretending) mainframe

In a subsequent edition of Networking, I found that Tanenbaum had castigated
the 7 OSI layers as useless and unnecessary with the 3 TCP layers being
more realisitc

Still further(?) editions, he would introduce 5 layers as a hybrid between the
international but failed OSI standard and the ubiquitous but incomplete TCP
standard

Why am I saying all this?

A layered understanding is the bedrock of our field
Except that sometimes it works
And sometimes it doesn't

The 3 layers here are
- UTF-8 layer
- Unicode codepoint layer
- Linguistically useful (grapheme) layer

Marko's statements like UTF-8 is random access is so obviously wrong that
(my guess) is that he is not meaning it literally but elliptically as saying:
"This excessive layering is not working"

OTOH statements like level 2 is 90% good enough for level 3
is in the same ludicrous class as "The world is as wide as the Atlantic ocean" 
As pointed out above, agglutinating letters is the norm not the exception in the 
world's languages upto and including (latin in) English




More information about the Python-list mailing list