Grapheme clusters, a.k.a.real characters

Michael Torrie torriem at gmail.com
Fri Jul 14 10:32:27 EDT 2017


On 07/14/2017 08:05 AM, Rhodri James wrote:
> On 14/07/17 14:31, Marko Rauhamaa wrote:
>> Of course, UTF-8 in a bytes object doesn't make the situation any
>> better, but does it make it any worse?
> 
> Speaking as someone who has been up to his elbows in this recently, I 
> would say emphatically that it does make things worse.  It adds an extra 
> layer of complexity to all of the questions you were asking, and more. 
> A single codepoint is a meaningful thing, even if its meaning may be 
> modified by combining.  A single byte may or may not be meaningful.

Are you saying that dealing with Unicode in Google Go, which uses UTF-8
in memory, is adding an extra layer of complexity and makes things worse
than they might be in Python?




More information about the Python-list mailing list