A few questiosn about encoding

Nick the Gr33k support at superhost.gr
Fri Jun 14 02:59:59 EDT 2013


On 14/6/2013 4:00 πμ, Cameron Simpson wrote:
> On 13Jun2013 17:19, Nikos as SuperHost Support <support at superhost.gr> wrote:
> | A code-point and the code-point's ordinal value are associated into
> | a Unicode charset. They have the so called 1:1 mapping.
> |
> | So, i was under the impression that by encoding the code-point into
> | utf-8 was the same as encoding the code-point's ordinal value into
> | utf-8.
> |
> | So, now i believe they are two different things.
> | The code-point *is what actually* needs to be encoded and *not* its
> | ordinal value.
>
> Because there is a 1:1 mapping, these are the same thing: a code
> point is directly _represented_ by the ordinal value, and the ordinal
> value is encoded for storage as bytes.

So, you are saying that:

chr(16474).encode('utf-8')   #being the code-point encoded

ord(chr(16474)).encode('utf-8')     #being the code-point's ordinal 
encoded which gives an error.

that shows us that a character is what is being be encoded to utf-8 but 
the character's ordinal cannot.

So, whay you say "....and the ordinal value is encoded for storage as 
bytes." ?


> | > The leading 0b is just syntax to tell you "this is base 2, not base 8
> | > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
> |
> | But byte objects are represented as '\x' instead of the
> | aforementioned '0x'. Why is that?
>
> You're confusing a "string representation of a single number in
> some base (eg 2 or 16)" with the "string-ish representation of a
> bytes object".

 >>> bin(16474)
'0b100000001011010'
that is a binary format string representation of number 16474, yes?

 >>> hex(16474)
'0x405a'
that is a hexadecimal format string representation of number 16474, yes?

WHILE:

b'abc\x1b\n' = a string representation of a byte, which in turn is a 
series of integers, so that makes this a string representation of 
integers, is this correct?

\x1b = ESC character

\ = for seperating bytes
x = to flag that the following bytes are going to be represented as hex 
values? whats exactly 'x' means here? character perhaps?

Still its not clear into my head what the difference of '0x1b' and 
'\x1b' is:

i think:
0x1b = an integer represented in hex format

\x1b = a character represented in hex format

id this true?




> | How can i view this byte's object representation as hex() or as bin()?
>
> See above. A bytes is a _sequence_ of values. hex() and bin() print
> individual values in hexadecimal or binary respectively.

 >>> for value in b'\x97\x98\x99\x27\x10':
...     print(value, hex(value), bin(value))
...
151 0x97 0b10010111
152 0x98 0b10011000
153 0x99 0b10011001
39 0x27 0b100111
16 0x10 0b10000


 >>> for value in b'abc\x1b\n':
...     print(value, hex(value), bin(value))
...
97 0x61 0b1100001
98 0x62 0b1100010
99 0x63 0b1100011
27 0x1b 0b11011
10 0xa 0b1010


Why these two give different values when printed?
-- 
What is now proved was at first only imagined!



More information about the Python-list mailing list