A few questiosn about encoding

Νικόλαος Κούρας support at superhost.gr
Thu Jun 13 10:19:47 EDT 2013


On 13/6/2013 2:49 μμ, Steven D'Aprano wrote:

Please confirm these are true statement:

A code-point and the code-point's ordinal value are associated into a 
Unicode charset. They have the so called 1:1 mapping.

So, i was under the impression that by encoding the code-point into 
utf-8 was the same as encoding the code-point's ordinal value into utf-8.

So, now i believe they are two different things.
The code-point *is what actually* needs to be encoded and *not* its 
ordinal value.


 > The leading 0b is just syntax to tell you "this is base 2, not base 8
 > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

But byte objects are represented as '\x' instead of the aforementioned 
'0x'. Why is that?

> ints always display in decimal. The only way to display in another base
> is to build a string showing what the int would look like in a different
> base:
>
> py> hex(16474)
> '0x405a'
>
> Notice that the return value of bin, oct and hex are all strings. If they
> were ints, then they would display in decimal, defeating the purpose!

Thank you didn't knew that! indeed it working like this.

To encode a number we have to turn it into a string first.

"16474".encode('utf-8')
b'16474'

That 'b' stand for bytes.
How can i view this byte's object representation as hex() or as bin()?

============
Also:
 >>> len('0b100000001011010')
17

You said this string consists of 17 chars.
Why the leading syntax of '0b' counts as bits as well? Shouldn't be 15 
bits instead of 17?






More information about the Python-list mailing list