Glyphs and graphemes [was Re: Cult-like behaviour]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Jul 16 21:26:55 EDT 2018


On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:

> All UTF-8. No unicode strings.

That just means you are re-implementing the bits of Unicode you care 
about (which may be "nothing at all") as UTF-8. If your application is 
nothing but middleware squirting bytes from one layer to another layer, 
that might be all you need care about.

But then you're not processing text in your application, and why should 
your experience in not-processing-text be given any weight over the 
experiences of those who do process text?


And later, in another post:

> UTF-8 bytes can only represent the first 128 code points of Unicode.

This is DailyWTF material. Perhaps you want to rethink your wording and 
maybe even learn a bit more about Unicode and the UTF encodings before 
making such statements.

The idea that UTF-8 bytes cannot represent the whole of Unicode is not 
even wrong. Of course a *single* byte cannot, but a single byte is not 
"UTF-8 bytes".


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson




More information about the Python-list mailing list