Glyphs and graphemes [was Re: Cult-like behaviour]

Marko Rauhamaa marko at pacujo.net
Mon Jul 16 16:54:35 EDT 2018


Chris Angelico <rosuav at gmail.com>:
> Challenge: Reverse a string in UTF-8.

Counter-challenge: Reverse a Unicode string:

   >>> s = "a\u0304e"
   >>> s
   'āe'
   >>> L = list(s)
   >>> L.reverse()
   >>> "".join(L)
   'ēa'

> Challenge: Center text in UTF-8.

Counter-challenge: Center a Unicode string:

   >>> t = s * 3
   >>> t
   'āeāeāe'
   >>> t.center(9)
   'āeāeāe'

> Challenge: Given a (non-initial) character in a buffer of UTF-8 bytes,
> find the immediately preceding character.

The counter-challenge is left as an exercise for the reader.

> All of these are fundamentally difficult by nature, but if you index
> by code points, you eliminate one level of difficulty; indexing by
> bytes retains all the existing difficulty and adds another layer.

Oh, sorry. I thought you were suggesting Unicode strings would make the
challenges somehow easy.


Marko



More information about the Python-list mailing list