binascii.b2a vs ord()
MRAB
python at mrabarnett.plus.com
Sun Jan 10 16:10:20 EST 2021
On 2021-01-10 18:50, Dennis Lee Bieber wrote:
> On Sun, 10 Jan 2021 03:29:37 -0000 (UTC), Bischoop <Bischoop at vimart.net>
> declaimed the following:
>
>>I wanted to learn about conversion string to Ascii.
>>So I learn about binascii.b2a but because the output wasn't what I
>>wanted got deeper and found out about ord(c) and actually that's what
>>I'expected.
>>So what's that binascii and why I cant convert ascii that I got from ord
>>to string by using char, instead I'm getting some strings in weird
>>coding.
>>
>>
>>import binascii
>>Ttext = b'This is a string'
>>text2 = 'This is a string'
>
> Item: this is a Unicode string. Python Unicode strings are only 1-byte
> per character IF all characters are in the 7-bit ASCII range. If you have
> any extended characters (which would, say, be one byte in ISO-Latin-1) they
> could turn the entire Unicode string into 2-byte per character (and really
> expanded sets could be 3 or 4 bytes per character).
>
[snip]
Are you confusing the internal representation in CPython 3.3+ with UTF-8?
In CPython 3.3+, Unicode strings are stored as 1 byte per codepoint if
all of the codepoints are U+0000..U+00FF, else as 2 bytes per codepoint
if all are U+010000..U+10FFFF, else as 4 bytes per codepoint.
But that's an implementation detail.
More information about the Python-list
mailing list