Understanding Unicode & encodings

John Machin sjmachin at lexicon.net
Sun Jul 23 18:20:49 EDT 2006


clarkcb at gmail.com wrote:
> Raphael.Benedet at gmail.com wrote:
> > I tried to encode the different variables in many different encodings
> > (latin1), but I always get an exception. Where does this ascii codec
> > error comes from? How can I simply build this query string?
>
> Raphael,
>
> The 'ascii' encoding is set in the python library file site.py
> (/usr/lib/python2.4/site.py on my gentoo machine) as the system default
> encoding for python. The solution I used to the problem you're
> describing was to create a sitecustomize.py file and redefine the
> encoding as 'utf-8'.

Here is the word from on high (effbot, April 2006):
"""
(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).
"""

That exception is a wake-up call -- it means "you don't have a clue how
your 8-bit strings are encoded". You are intended to obtain a clue
(case by case), and specify the encoding explicitly (case by case).
Sure the current app might dump utf_8 on you. What happens if the next
app dumps latin1 or cp1251 or big5 on you?

> This gets you halfway there. Beyond that you need to "stringify" the
> (potentially Unicode) strings during concatenation, e.g.:
>
> self.dbCursor.execute("""INSERT INTO track (name, nbr, idartist,
> idalbum, path)
>                          VALUES ('%s', %s, %s, %s, '%s')""" % \
>                          (str(track), nbr, idartist, idalbum, path))
>
> (Assuming that track is the offending string.) I'm not exactly sure why
> this explicit conversion is necessary, as it is supposed to happen
> automatically, but I get the same UnicodeDecodeError error without it.

Perhaps if you were to supply info like which DBMS, type of the
offending column in the DB, Python type of the value that *appears* to
need stringification, ... we could help you too.

Cheers,
John




More information about the Python-list mailing list