[Tutor] character encoding
wesley chun
wescpy at gmail.com
Wed Jul 9 08:12:24 CEST 2008
> > Hi, I'm puzzled by the character encodings which I get when I use Python
> > with IDLE. The string '\xf6' represents a letter in the Swedish alphabet
> > when coded with utf8. On our computer with MacOSX this gets coded as
> > '\xc3\xb6' which is a string of length 2. I have configured IDLE to encode
> > utf8 but it doesn't make any difference.
>
> I think you may be a bit confused about utf-8. '\xf6' is not a utf-8
> character. U00F6 is the Unicode (not utf-8) codepoint for LATIN SMALL
> LETTER O WITH DIAERESIS. '\xf6' is also the Latin-1 encoding of this
> character. The utf-8 encoding of this character is the two-byte
> sequence '\xc3\xb6'.
>
> Also you might want to do some background reading on Unicode;
> this is a good place to start:
> http://www.joelonsoftware.com/articles/Unicode.html
kent is quite correct, and here is some Python code to demo it:
>>> x = u'\xf6'
>>> x
u'\xf6'
>>> print x
ö
>>> y = x.encode('utf-8')
>>> y
'\xc3\xb6'
>>> print y
ö
in the code above, our source string 'x' is a Unicode string, which is
"pure," meaning that it has not been encoded by any codec. we encode
this Unicode string into a UTF-8 binary string 'y', which takes up 2
bytes as Kent has mentioned already. we are able to dump the variables
as well as print them fine to the screen because our terminal was set
to UTF-8.
if we switch our terminal output to Latin-1, then we can view it that
way -- notice that the Latin-1 encoding only takes 1 byte instead of 2
for UTF-8:
>>> z = x.encode('latin-1')
>>> z
'\xf6'
>>> print z
ö
here's another recommended Unicode document that is slightly more
Python-oriented:
http://wiki.pylonshq.com/display/pylonsdocs/Unicode
cheers,
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
http://corepython.com
wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com
More information about the Tutor
mailing list