Glyphs and graphemes [was Re: Cult-like behaviour]

Marko Rauhamaa marko at pacujo.net
Mon Jul 16 15:51:32 EDT 2018


Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> Under that standard definition, UTF-8 and UTF-16 are variable-width,
> and UTF-32 is fixed-width.
>
> But I'll accept that UTF-32 is variable-width if Marko accepts that
> ASCII is too.

If that makes you happy, fine. The point is, UTF-32 has no advantages
over UTF-8. And I'm referring to the text abstraction as seen by the
programmer. It has nothing to do with the layout of bytes inside
CPython.

I use UTF-8 in my C programs and sense no disadvantage. I have never
felt a need for wchar_t. Similarly, I had a small Python2 program that
quizzed me about Hebrew vocabulary with Finnish translations and
Esperanto pronunciation instructions. All UTF-8. No unicode strings. (I
*have* converted that to Python3 just to be on the bleeding edge, but it
didn't give me any advantages over Python2.)


Marko



More information about the Python-list mailing list