Removing Unicode from Python?

Hallvard B Furuseth (nospam nospam) h.b.furuseth at usit.uio.no
Thu Oct 30 13:34:38 EST 2003


Paradox wrote:

> Isn't utf-8 the same as latin-1.

Not at all.  Latin-1 consists of ASCII + 96 8-bit characters, all
encoded in one byte each.  UTF-8 is an encoding of the entire iso10646-1
(Unicode) character set, which includes most characters in the world.
The 8-bit latin-1 characters are encoded with two bytes in UTF-8, while
ASCII characters are encoded as ASCII.  Maybe what you are thinking of
is that latin-1 characters have the same numeric codes in iso10646-1 as
they do in latin-1.

To convert the string str from utf-8 to latin-1, use
   unicode(str, 'utf-8').encode('iso8859-1')
To convert from latin-1 to utf-8, use
   unicode(str, 'iso8859-1').encode('utf-8')

-- 
Hallvard




More information about the Python-list mailing list