UTF-8 usage in Python 2.0
Erno Kuusela
erno-news at erno.iki.fi
Fri Oct 27 18:59:42 EDT 2000
| On the professional side, I receive html files translated to french.
| They are coded in html entity. I need to translate them to UTF-8. I
| currently use Tidy to do this, but I need to do some manual
| modifications after it.
you can use the unicode() built-in function to convert old-fashioned
8-bit strings to unicode, using various character sets (i don't
remember what character set macos uses, but you get the idea):
s = unicode('kääpiö', 'latin-1')
now s is a unicode string equivalent to the unicode string constant
u'k\N{LATIN SMALL LETTER A WITH DIAERESIS}\N{LATIN SMALL LETTER A WITH DIAERESIS}pi\N{LATIN SMALL LETTER O WITH DIAERESIS}'
you can convert it back to a 8-bit string with
s.encode(encoding-name). for example
s.encode('utf-8') -> 'k\303\244\303\244pi\303\266'
-- erno
More information about the Python-list
mailing list