binascii.b2a vs ord()

MRAB python at mrabarnett.plus.com
Sun Jan 10 16:10:20 EST 2021


On 2021-01-10 18:50, Dennis Lee Bieber wrote:
> On Sun, 10 Jan 2021 03:29:37 -0000 (UTC), Bischoop <Bischoop at vimart.net>
> declaimed the following:
> 
>>I wanted to learn about conversion string to Ascii.
>>So I learn about binascii.b2a but because the output wasn't what I
>>wanted got deeper and found out about ord(c) and actually that's what
>>I'expected.
>>So what's that binascii and why I cant convert ascii that I got from ord
>>to string by using char, instead I'm getting some strings in weird
>>coding.
>>
>>
>>import binascii
>>Ttext = b'This is a string'
>>text2 = 'This is a string'
> 
> 	Item: this is a Unicode string. Python Unicode strings are only 1-byte
> per character IF all characters are in the 7-bit ASCII range. If you have
> any extended characters (which would, say, be one byte in ISO-Latin-1) they
> could turn the entire Unicode string into 2-byte per character (and really
> expanded sets could be 3 or 4 bytes per character).
> 
[snip]

Are you confusing the internal representation in CPython 3.3+ with UTF-8?

In CPython 3.3+, Unicode strings are stored as 1 byte per codepoint if 
all of the codepoints are U+0000..U+00FF, else as 2 bytes per codepoint 
if all are U+010000..U+10FFFF, else as 4 bytes per codepoint.

But that's an implementation detail.


More information about the Python-list mailing list