Grapheme clusters, a.k.a.real characters

Rhodri James rhodri at kynesim.co.uk
Fri Jul 14 10:05:37 EDT 2017


On 14/07/17 14:31, Marko Rauhamaa wrote:
> Of course, UTF-8 in a bytes object doesn't make the situation any
> better, but does it make it any worse?

Speaking as someone who has been up to his elbows in this recently, I 
would say emphatically that it does make things worse.  It adds an extra 
layer of complexity to all of the questions you were asking, and more. 
A single codepoint is a meaningful thing, even if its meaning may be 
modified by combining.  A single byte may or may not be meaningful.

-- 
Rhodri James *-* Kynesim Ltd



More information about the Python-list mailing list