Understanding Unicode & encodings

clarkcb at gmail.com clarkcb at gmail.com
Sun Jul 23 14:48:48 EDT 2006


Raphael.Benedet at gmail.com wrote:
> I tried to encode the different variables in many different encodings
> (latin1), but I always get an exception. Where does this ascii codec
> error comes from? How can I simply build this query string?

Raphael,

The 'ascii' encoding is set in the python library file site.py
(/usr/lib/python2.4/site.py on my gentoo machine) as the system default
encoding for python. The solution I used to the problem you're
describing was to create a sitecustomize.py file and redefine the
encoding as 'utf-8'. The entire file contents look like this:

--------
'''
Site customization: change default encoding to UTF-8
'''
import sys
sys.setdefaultencoding('utf-8')
--------

For more info on creating a sitecustomize.py file, read the comments in
the site.py file.

I use UTF-8 because I do a lot of multilingual text manipulation, but
if all you're concerned about is Western European, you could also use
'latin1'.

This gets you halfway there. Beyond that you need to "stringify" the
(potentially Unicode) strings during concatenation, e.g.:

self.dbCursor.execute("""INSERT INTO track (name, nbr, idartist,
idalbum, path)
                         VALUES ('%s', %s, %s, %s, '%s')""" % \
                         (str(track), nbr, idartist, idalbum, path))

(Assuming that track is the offending string.) I'm not exactly sure why
this explicit conversion is necessary, as it is supposed to happen
automatically, but I get the same UnicodeDecodeError error without it.

Hope this helps,
Cary




More information about the Python-list mailing list