Understanding Unicode & encodings

John Machin sjmachin at lexicon.net
Sun Jul 23 17:34:22 EDT 2006


Jim wrote:
> Raphael.Benedet at gmail.com wrote:
> > Hello,
> >
> > For my application, I would like to execute an SQL query like this:
> > self.dbCursor.execute("INSERT INTO track (name, nbr, idartist, idalbum,
> > path) VALUES ('%s', %s, %s, %s, '%s')" % (track, nbr, idartist,
> > idalbum, path))
> No, I'll bet that you'd like to run something like
>   self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
> idalbum,path) VALUES (%(track)s, %(nbr)s,
> %(idartist)s,%(idalbum)s,'%(path)s')",
> {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})
> (only without my typos).  That's an improvment for a number of reasons,
> one of which is that the system will quote for you, for instance in
> idartist="John's Beer" changing the single quote to two single quotes
> to suit SQL.

>   self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
> idalbum,path) VALUES (%(track)s, %(nbr)s,
> %(idartist)s,%(idalbum)s,'%(path)s')",
> {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})

I see no improvement here.

The OP's code is effectively::

  sql = "INSERT INTO track (name, ..., path) VALUES ('%s', ..., '%s')"
  value_tuple = (track, ...., path)
  self.dcCursor.execute(sql % value_tuple)

Your suggested replacement is effectively:

  sql = "INSERT INTO track (name, ...,path) VALUES (%(track)s,
...,'%(path)s')"
  str_fmt_dict = {'track':track, ...,'path':path}
  self.dcCursor.execute(sql, str_fmt_dict)

Well, that won't run at all. Let's correct the presumed typo:

   self.dcCursor.execute(sql % str_fmt_dict)

Now, the only practical difference is that you have REMOVED the OP's
explicit quoting of the first column value. Changing the string
formatting from the %s style to the %(column_name) style achieves
nothing useful. You are presenting the "system" with a constant SQL
string -- it is not going to get any chance to fiddle with the quoting.
However the verbosity index has gone off the scale: each column name is
mentioned 4 times (previously 1).

I would suggest the standard default approach:

  sql = "INSERT INTO track (name, ..., path) VALUES (?, ..., ?)"
  value_tuple = (track, ...., path)
  self.dcCursor.execute(sql, value_tuple)

The benefits of doing this include that the DBAPI layer gets to
determine the type of each incoming value and the type of the
corresponding DB column, and makes the appropriate adjustments,
including quoting each value properly, if quoting is necessary.

> > Every time I execute this, I get an exception like
> > this:
> >
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position
> > 64: ordinal not in range(128)
> >
> > I tried to encode the different variables in many different encodings
> > (latin1), but I always get an exception. Where does this ascii codec
> > error comes from? How can I simply build this query string?

> Some more information may help: is the error returned before or during
> the execute call?  If before, then the execute() call is a distraction.
>  If during, then what is your dB, what is it's encoding (is the dB
> using latin1, or does the dB only accept ascii?), and what are you
> using to connect to it?

These are very sensible questions. Some more q's for the OP:

(1) What is the schema for the 'track' table?

(2) "I tried to encode the different variables in many different
encodings (latin1)" -- you say "many different encodings" but mention
only one ... please explain and/or show a sample of the actual code of
the "many different" attempts.

(3) You said that your input values (produced by some libblahblah) were
in Unicode -- are you sure? The exception that you got means that it
was trying to convert *from* an 8-bit string *to* Unicode, but used the
default ASCII codec (which couldn't hack it). Try doing this before the
execute() call:

  print 'track', type(track), repr(track)
  ...
  print 'path', type(path), repr(path)

and change the execute() call to three statements along the above
lines, so we can see (as Jim asked) where the exception is being
raised.

HTH,
John




More information about the Python-list mailing list