Grapheme clusters, a.k.a.real characters

Neil Cerutti neilc at norwich.edu
Fri Jul 14 14:22:46 EDT 2017


On 2017-07-14, Rhodri James <rhodri at kynesim.co.uk> wrote:
> On 14/07/17 15:32, Michael Torrie wrote:
>> Are you saying that dealing with Unicode in Google Go, which
>> uses UTF-8 in memory, is adding an extra layer of complexity
>> and makes things worse than they might be in Python?
>
> I'm not familiar with Go.  If the programmer has to be aware
> that the she is using UTF-8 under the hood, then yes, it does
> add an extra layer of complexity.  You have to remember the
> rules of UTF-8 as well as everything else.

Go represents strings as sequences of bytes. It provides separate
API's that allow you to regard those bytes as either plain old
bytes, or as a sequence of runes (not-necessarily normalized
codepoints). If your bytes strings aren't in UTF-8, then Go Away.

https://blog.golang.org/strings

-- 
Neil Cerutti




More information about the Python-list mailing list