UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

Νίκος nikos at superhost.gr
Thu Jul 4 08:06:47 EDT 2013


Στις 4/7/2013 2:52 μμ, ο/η MRAB έγραψε:
> On 04/07/2013 12:29, Νίκος wrote:
>> Στις 4/7/2013 1:54 μμ, ο/η Chris Angelico έγραψε:
>>> On Thu, Jul 4, 2013 at 8:38 PM, ����� <nikos at superhost.gr> wrote:
>>>> So you are also suggesting that what gesthostbyaddr() returns is not
>>>> utf-8
>>>> encoded too?
>>>>
>>>> What character is 0xb6 anyways?
>>>
>>> It isn't. It's a byte. Bytes are not characters.
>>>
>>> http://www.joelonsoftware.com/articles/Unicode.html
>>
>> Well in case of utf-8 encoding for the first 127 codepoing we can safely
>> say that a character equals a byte :)
>>
> Equals? No. Bytes are not characters. (Strictly speaking, they're
> codepoints, not characters.)
>
> And anyway, it's the first _128_ codepoints.

Yes 0-127 = 128, i knew that!

Well the relationship between characters and bytes is that:

A [0-127] Unicode codepoints(characters) need 1-byte to be stored in 
utf-8 encoding.

I think its also correct to say that the byte in the above situation is 
the representation of our character.

-- 
What is now proved was at first only imagined!



More information about the Python-list mailing list