UTF-8 usage in Python 2.0

Erno Kuusela erno-news at erno.iki.fi
Fri Oct 27 18:59:42 EDT 2000


 | On the professional side, I receive html files translated to french.
 | They are coded in html entity. I need to translate them to UTF-8. I
 | currently use Tidy to do this, but I need to do some manual
 | modifications after it.

you can use the unicode() built-in function to convert old-fashioned
8-bit strings to unicode, using various character sets (i don't
remember what character set macos uses, but you get the idea):

s = unicode('kääpiö', 'latin-1')

now s is a unicode string equivalent to the unicode string constant
u'k\N{LATIN SMALL LETTER A WITH DIAERESIS}\N{LATIN SMALL LETTER A WITH DIAERESIS}pi\N{LATIN SMALL LETTER O WITH DIAERESIS}'

you can convert it back to a 8-bit string with
s.encode(encoding-name). for example

s.encode('utf-8') -> 'k\303\244\303\244pi\303\266'

  -- erno



More information about the Python-list mailing list