Unicode error in exefied Python script

Martin v. Löwis martin at v.loewis.de
Tue Jan 21 07:15:52 EST 2003


Christian Bretterklieber <cb at orcl.net> writes:

> > No. Instead, you should modify your script not to rely on the system
> > default encoding.
> 
>   This sounds like a lot of work. 

It's not a lot of work if you design carefully. I suggest to follow
this principle:
- convert all character data from byte strings to Unicode strings
  as early as possible in the processing chain (e.g. when reading
  them from a file, or a socket)
- convert all such data back to byte strings as late as possible
  (i.e. immediately before performing output)

> I have data from various sources. And
> does that mean if I set a simple string within my script, I also need to
> call encode? Like:
> 
>   mystring = 'Suppi String äüß ...'.encode('latin1')

As another design principle, try to avoid non-ASCII characters in
source code (use gettext instead, atleast if you are targetting
multiple languages). If you need them, be aware that Python 2.2
doesn't really support encodings for such strings - they are in the
encoding that the source code file happens to be in. In Python 2.3,
you will have to declare that encoding to Python.

In any case, as this is a string literal, obtaining a Unicode object
from it would require the unicode builtin, or the .decode method; the
.encode method will give a byte string back to you.

In Python 2.3, I would recommend that you declare the encoding, and
use Unicode literals instead of string literals (due to a limitation
in Python 2.2, Unicode literals are only meaningful in that version
when the source file is encoded in Latin-1).

Regards,
Martin




More information about the Python-list mailing list