Removing Unicode from Python?
Hallvard B Furuseth (nospam nospam)
h.b.furuseth at usit.uio.no
Thu Oct 30 13:34:38 EST 2003
Paradox wrote:
> Isn't utf-8 the same as latin-1.
Not at all. Latin-1 consists of ASCII + 96 8-bit characters, all
encoded in one byte each. UTF-8 is an encoding of the entire iso10646-1
(Unicode) character set, which includes most characters in the world.
The 8-bit latin-1 characters are encoded with two bytes in UTF-8, while
ASCII characters are encoded as ASCII. Maybe what you are thinking of
is that latin-1 characters have the same numeric codes in iso10646-1 as
they do in latin-1.
To convert the string str from utf-8 to latin-1, use
unicode(str, 'utf-8').encode('iso8859-1')
To convert from latin-1 to utf-8, use
unicode(str, 'iso8859-1').encode('utf-8')
--
Hallvard
More information about the Python-list
mailing list