Grapheme clusters, a.k.a.real characters

Terry Reedy tjreedy at udel.edu
Fri Jul 14 20:02:48 EDT 2017


On 7/14/2017 5:51 PM, Marko Rauhamaa wrote:

> Yes, in Python2, Go, C and GNU textutils, when you print a text string
> containing a mixture of languages, you see characters.
> 
> Why?
> 
> Because that's what the terminal emulator chooses to do upon receiving
> those bytes.

 >>> s = u'\u1171\u2222\u3333\u4444\u5555'
 >>> s
u'\u1171\u2222\u3333\u4444\u5555'
 >>> print(s)
ᅱ∢㌳䑄啕
 >>> b = s.encode('utf-8')
 >>> b
'\xe1\x85\xb1\xe2\x88\xa2\xe3\x8c\xb3\xe4\x91\x84\xe5\x95\x95'
 >>> print(b)
ᅱ∢㌳䑄啕

I prefer the accurate 5 char print of the text string to the print of 
the bytes.

-- 
Terry Jan Reedy





More information about the Python-list mailing list