Glyphs and graphemes [was Re: Cult-like behaviour]

Chris Angelico rosuav at gmail.com
Mon Jul 16 17:05:24 EDT 2018


On Tue, Jul 17, 2018 at 6:54 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
>> Challenge: Reverse a string in UTF-8.
>
> Counter-challenge: Reverse a Unicode string:
>
>    >>> s = "a\u0304e"
>    >>> s
>    'āe'
>    >>> L = list(s)
>    >>> L.reverse()
>    >>> "".join(L)
>    'ēa'
>
>> Challenge: Center text in UTF-8.
>
> Counter-challenge: Center a Unicode string:
>
>    >>> t = s * 3
>    >>> t
>    'āeāeāe'
>    >>> t.center(9)
>    'āeāeāe'
>
>> Challenge: Given a (non-initial) character in a buffer of UTF-8 bytes,
>> find the immediately preceding character.
>
> The counter-challenge is left as an exercise for the reader.
>
>> All of these are fundamentally difficult by nature, but if you index
>> by code points, you eliminate one level of difficulty; indexing by
>> bytes retains all the existing difficulty and adds another layer.
>
> Oh, sorry. I thought you were suggesting Unicode strings would make the
> challenges somehow easy.

So now that you've actually read my entire post, you'll see that there
are fundamental difficulties, but that UTF-8 introduces more. Great.
Now go ahead and reply to my post, knowing my actual point.
Congratulations on posting something of no value.

ChrisA



More information about the Python-list mailing list