[Tutor] character encoding

Eric Abrahamsen eric at ericabrahamsen.net
Wed Jul 9 05:05:48 CEST 2008


As for other resources, I recently came across this:
http://farmdev.com/talks/unicode/

This was the first explanation that really made me understand the  
difference between Unicode and utf-8 (and realize that I'd been using  
the terms 'encode' and 'decode' backwards!). Anyway, just one more  
resource.

E


On Jul 9, 2008, at 9:32 AM, Kent Johnson wrote:

> On Tue, Jul 8, 2008 at 5:19 PM, Robert Johansson
> <robert.johansson at math.umu.se> wrote:
>> Hi, I'm puzzled by the character encodings which I get when I use  
>> Python
>> with IDLE. The string '\xf6' represents a letter in the Swedish  
>> alphabet
>> when coded with utf8. On our computer with MacOSX this gets coded as
>> '\xc3\xb6' which is a string of length 2. I have configured IDLE to  
>> encode
>> utf8 but it doesn't make any difference.
>
> I think you may be a bit confused about utf-8. '\xf6' is not a utf-8
> character. U00F6 is the Unicode (not utf-8) codepoint for LATIN SMALL
> LETTER O WITH DIAERESIS. '\xf6' is also the Latin-1 encoding of this
> character. The utf-8 encoding of this character is the two-byte
> sequence '\xc3\xb6'.
>
> Can you give some more specific details about what you do and what you
> see? Also you might want to do some background reading on Unicode;
> this is a good place to start:
> http://www.joelonsoftware.com/articles/Unicode.html
>
> Kent
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list