Python Unicode to String conversion
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Sep 17 05:30:40 EDT 2007
En Mon, 17 Sep 2007 01:33:14 -0300, Richard Levasseur
<richardlev at gmail.com> escribi�:
> When dealing with unicode, i've run into situations where I have
> multiple encodings in the same string, usually latin1 and utf8
> (latin1 != ascii, and latin1 != utf8, and they don't play nice
> together). So, for future readers, if you have problems dealing with
> unicode encode and decode, try using a mix of latin1 and utf8
> encodings to figure out whats going on, and what characters are
> fubar'ing the process.
Life is easier if you follow these guidelines:
- work internally always in Unicode (not strings)
- All input data (read from files, coming from an Internet connection,
typed by user...) should be decoded from byte strings into unicode as
early as possible. (You should know which encoding your data comes in, in
each case)
- All output data (written to files, printing to screen, etc) is encoded
from unicode into byte strings as late as possible.
This way, unless your input data is garbage, you never could mix strings
from different encodings.
For further information, read the Unicode Howto
<http://www.amk.ca/python/howto/unicode> and this excerpt form the "Python
Cookbook", by Alex Martelli
<http://www.onlamp.com/pub/a/python/excerpt/pythonckbk_chap1/index.html>
--
Gabriel Genellina
More information about the Python-list
mailing list