imaplib: is this really so unwieldy?

Tue May 25 13:38:54 EDT 2021

On 2021-05-25, MRAB <python at mrabarnett.plus.com> wrote:
> On 2021-05-25 16:41, Dennis Lee Bieber wrote:

>> In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER
>> CHARACTER (I don't recall if there is a 3-byte version). If your
>> input bytes are all 7-bit ASCII, then they map directly to a 1-byte
>> per character string. If they contain any 8-bit upper half
>> character they may map into a 2-byte per character string.
>> 
> In CPython 3.3+:
>
> U+0000..U+00FF are stored in 1 byte.
> U+0100..U+FFFF are stored in 2 bytes.
> U+010000..U+10FFFF are stored in 4 bytes.

Are all characters in a string stored with the same "width"? IOW, does
the presense of one Unicode character in the range U+010000..U+10FFFF
in a string that is otherwise all 7-bit ASCII values result in the
entire string being stored 4-bytes per character? Or is the storage
width variable within a single string?

--
Grant