Cult-like behaviour [was Re: Kindness]

Ian Kelly ian.g.kelly at gmail.com
Mon Jul 16 14:15:19 EDT 2018


On Mon, Jul 16, 2018 at 12:02 PM Terry Reedy <tjreedy at udel.edu> wrote:
>
> On 7/15/2018 5:28 PM, Marko Rauhamaa wrote:
>
> > if your new system used Python3's UTF-32 strings as a foundation,
>
> Since 3.3, Python's strings are not (always) UFT-32 strings.  Nor are
> they always UCS-2 (or partly UTF-16) strings.  Nor are the always
> Latin-1 or Ascii strings.  Python's Flexible String Representation uses
> the narrowest possible internal code for any particular string.  This is
> all transparent to the user except for memory size.
>
> In 3.2 and before, Python's Unicode strings were either wide (UFT-32) or
> narrow (UCS-2 + surrogates or UFT-16 minus full compliance).  The
> difference was sometimes not transparent, and code that worked on one
> build could fail on the other.  Since 3.3, string code should work the
> same on any machines running the same Python version.
>
> > UTF-32, after all, is a variable-width encoding.
>
> Nope.  It a fixed-width (32 bits, 4 bytes) encoding.

Although it only really uses 21 (actually, more like 20.087) of those
bits. Given that and the similar naming, it's easy to see how people
sometimes confuse its structure with UTF-8.



More information about the Python-list mailing list