imaplib: is this really so unwieldy?

Tue May 25 13:25:47 EDT 2021

On 2021-05-25 16:41, Dennis Lee Bieber wrote:
> On Tue, 25 May 2021 10:23:41 +0200, hw <hw at adminart.net> declaimed the
> following:
> 
>>
>>So I'm forced to convert stuff from bytes to strings (which is weird 
>>because bytes are bytes) and to use regular expressions to extract the 
>>message-uids from what the functions return (which I shouldn't have to 
>>because when I'm asking a function to give me a uid, I expect it to 
>>return a uid).
>>
> 	In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER CHARACTER
> (I don't recall if there is a 3-byte version). If your input bytes are all
> 7-bit ASCII, then they map directly to a 1-byte per character string. If
> they contain any 8-bit upper half character they may map into a 2-byte per
> character string.
> 
In CPython 3.3+:

U+0000..U+00FF are stored in 1 byte.
U+0100..U+FFFF are stored in 2 bytes.
U+010000..U+10FFFF are stored in 4 bytes.

> 	Bytes in Python 3 are just a binary stream, which needs an encoding to
> produce characters. Use the wrong encoding (say ISO-Latin-1) when the data
> is really UTF-8 will result in garbage.
> 
>