A few questiosn about encoding

Thu Jun 13 10:19:47 EDT 2013

On 13/6/2013 2:49 μμ, Steven D'Aprano wrote:

Please confirm these are true statement:

A code-point and the code-point's ordinal value are associated into a 
Unicode charset. They have the so called 1:1 mapping.

So, i was under the impression that by encoding the code-point into 
utf-8 was the same as encoding the code-point's ordinal value into utf-8.

So, now i believe they are two different things.
The code-point *is what actually* needs to be encoded and *not* its 
ordinal value.

 > The leading 0b is just syntax to tell you "this is base 2, not base 8
 > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

But byte objects are represented as '\x' instead of the aforementioned 
'0x'. Why is that?

> ints always display in decimal. The only way to display in another base
> is to build a string showing what the int would look like in a different
> base:
>
> py> hex(16474)
> '0x405a'
>
> Notice that the return value of bin, oct and hex are all strings. If they
> were ints, then they would display in decimal, defeating the purpose!

Thank you didn't knew that! indeed it working like this.

To encode a number we have to turn it into a string first.

"16474".encode('utf-8')
b'16474'

That 'b' stand for bytes.
How can i view this byte's object representation as hex() or as bin()?

============
Also:
 >>> len('0b100000001011010')
17

You said this string consists of 17 chars.
Why the leading syntax of '0b' counts as bits as well? Shouldn't be 15 
bits instead of 17?