Using Unicode scripts

Gerhard Häring gh at ghaering.de
Fri Jul 18 06:35:38 EDT 2003


yzzzzz wrote:
> Hi,

Hi "yzzzzz",

> I am writing my python programs using a Unicode text editor. The files are
> encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
> or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.
> 
> For example, if I type print "é", it prints é. If I use a unicode string:
> a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
> which makes sense if the interpreter thinks I typed in u"é".
> 
> How can I solve this problem?

You might want to read the thread on this list/newsgroup I started 
yesterday called "Unicode problem"

Is it feasible for you to upgrade to Python 2.3? If so I'd recommend you 
do it already. 2.3 is pretty close to release now and it has support for 
source files in Unicode format. If your Unicode editor saves the text 
file with a BOM (it should) then under Python 2.3 your scripts will work 
as expected.

> Thank you
> 
> PS. I have no problem using Unicode strings in Python, I know how to
> manipulate and convert them, I'm just looking for how to specify the default
> encoding for the scripts I write.

See http://www.python.org/peps/pep-0263.html This is how it is 
implemented in Python 2.3.

-- Gerhard





More information about the Python-list mailing list