Cult-like behaviour [was Re: Kindness]

Marko Rauhamaa marko at pacujo.net
Mon Jul 16 16:36:17 EDT 2018


Chris Angelico <rosuav at gmail.com>:

> On Tue, Jul 17, 2018 at 5:40 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
>> You mean each code point is one code point wide. But that's rather an
>> irrelevant thing to state. The main point is that UTF-32 (aka
>> Unicode) uses one or more code points to represent what people would
>> consider an individual character.
>
> No, each code point is one code unit wide. It's not irrelevant.

Finally, we have reached the simple crux of the debate, and that's where
you and I disagree.

Unicode code points sure express many more things than UTF-8 bytes.
UTF-8 bytes can only represent the first 128 code points of Unicode.
However, even Unicode has given up trying to represent even basic
everyday symbols with single codepoints, which leads back to the
question of how Python3's Unicode strings are superior to Python2's
UTF-8 strings. They have the same up and downsides.


Marko



More information about the Python-list mailing list