[Baypiggies] Fwd: [sugar] Database Activity

Shannon -jj Behrens jjinux at gmail.com
Mon Mar 10 16:57:34 CET 2008


On Mon, Mar 10, 2008 at 12:12 AM, Kelly Yancey <kelly at nttmcl.com> wrote:
> Edward Cherlin wrote:
>  > Python UTF-8 issue. Can anybody help?
>  >
>
>    With no code, hence relying on my psychic debugging prowless....
>
>
>   >  Question 1:  Why does pysqlite convert the utf-8 sting to latin-1 in
>   >  the *query process* when my sqlite settings are for utf-8?
>   >  i.e., sqlite.h macro: #define SQLITE_UTF8 1
>
>    It doesn't*.  The poster's code did (or a library he is calling did).
>   See answer to Question 2.
>
>    * Unless you set the text_factory to something other than unicode,
>      that is.
>
>  http://oss.itsystementwicklung.de/download/pysqlite/doc/usage-guide.html#text_factory
>
>
>   >
>   >  Question 2:  Does the pysqlite user (programmer) have any control
>   >  over this query/conversion?  (I see nothing in the Python Database
>   >  API, or the pysqlite/sqlite documentation, about either automatic or
>   >  specified conversions.  Also, this conversion does not occur using
>   >  the sqlite3 client.)
>   >
>
>    Yes, see footnote to Question 1.
>    (I guess he didn't read the documentation very closely. :( )
>
>    My experience is that if a unicode string is recorded to the
>  database, it will be returned as a unicode string.
>    Now if you use that unicode object in a string context somewhere,
>  python will try to convert it based on your default encoding.  Which the
>  original poster states is "ascii".  Look fishy?  I think so.  Note that
>  the error is a Unicode*Encode*Error, not a Unicode*Decode*Error.  The
>  string returned by pysqlite was unicode, the poster's code is trying to
>  encode it into a non-unicode character set (ascii) somewhere.
>
>
>   >  Question 3:  Is this a pysqlite bug or a feature, and why?"
>
>    If my hunch is right, then it has nothing to do with pysqlite.  It is
>  a python feature.  The original poster has a bug in their code.  No
>  surprises here.
>
>    How to confirm if my hunch is right:
>
>    1.  Set the default encoding to 'utf-8'.  Run test program.  Does it
>        die?  (The original poster seems to recognize that this would be a
>        good way to identify where the problem lies but then refuses to
>        actually try it?!?)
>    2.  If the UnicodeEncodeError goes away, congratulations you have
>        identified that the bug is somewhere in your code.  You got a
>        perfectly-good unicode object back from the pysqlite and
>        erroneously used it in a non-unicode string context.  Python
>        tried to convert for you.  But you asked it to convert to 'ascii'
>        which is can't do.  You got an exception.  Please do not post
>        your code for us to find your bug for you.
>    3.  Set the default encoding back to 'ascii' if you like.
>    4.  Look a little higher up in your traceback to find *your* code.
>        That is where you are erroniously using the unicode object in
>        a string context.  You may be calling a third-party API that
>        expects strings rather than unicode objects.  Read the docs.  Fix
>        your code.  Rinse and repeat until all of your UnicodeEncodeError
>        exceptions go away.
>    5.  Audit your code for further similar bugs.
>
>    The worst part about this question is that Christian Boos already
>  gave the same answer (albeit more succinctly)!
>  http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00021.html
>
>    Kelly
>
>    P.S. I was feeling pretty generous when I started my free psychic
>         debugging session.  But I swear I can't help but think this
>         was all a ploy to boost osdir's ad views and now I'm quite
>         irritable.
>
>  --
>  Kelly Yancey
>  http://kbyanc.blogspot.com/
>
>
>
>  >
>  > ---------- Forwarded message ----------
>  > From: 7150 <linux.1 at litenverden.com>
>  > Date: Sat, Mar 8, 2008 at 4:02 PM
>  > Subject: Re: [sugar] Database Activity
>  > To: sugar at lists.laptop.org
>  >
>  >  This will be my last post on this topic until I learn a bit more.
>  >
>  >  It appears that pysqlite has some data integrity issues.
>  >
>  >  Interesting link to pysqlite "bug" discussion:
>  >  http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00020.html
>  >
>  >  ---
>  >
>  >  "I have built sqlite databases containing utf-8 encoded text.  When I
>  >  access them using pysqlite, utf-8 codepoints appear to be converted to
>  >  latin-1."
>  >
>  >  It does this.
>  >
>  >  How to stop it?
>  >
>  >  An answer on the list was
>  >  (http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00022.html):
>  >
>  >   > This should be:
>  >   >
>  >   >      req.write(elem.encode('utf-8'))
>  >   >
>  >   > write() expects a str object, so what happened
>  >   > in your code was an implicit conversion of your
>  >   > unicode object to a str, doing the encoding using
>  >   > the default system encoding, here 'ascii'.
>  >
>  >  The req.write stuff is Greek to me, but I'll see what I can find out
>  >  about it.
>  >
>  >  Then from the same post:
>  >
>  >  "So, upon reflection, my situation boils down to this:
>  >
>  >  (1) A utf-8 encoded string in an sqlite database is queried using
>  >  pysqlite which returns a latin-1 string (a
>  >  gratuitous/silent/unspecified/unrequested conversion).
>  >
>  >  . . .
>  >
>  >  pysqlite QUESTIONS:
>  >
>  >  Question 1:  Why does pysqlite convert the utf-8 sting to latin-1 in the
>  >  *query process* when my sqlite settings are for utf-8?
>  >  i.e., sqlite.h macro: #define SQLITE_UTF8 1
>  >
>  >  Question 2:  Does the pysqlite user (programmer) have any control over
>  >  this query/conversion?  (I see nothing in the Python Database API, or
>  >  the pysqlite/sqlite documentation, about either automatic or specified
>  >  conversions.  Also, this conversion does not occur using the sqlite3
>  >  client.)
>  >
>  >  Question 3:  Is this a pysqlite bug or a feature, and why?"
>  >
>  >  The group had no further answer for the guy.
>  >
>  >  ---
>  >
>  >  Someone else has noted thet the SQLite3 client does not transform the
>  >  data.  If I learn how to query my UTF-8 database on the XO, I'll let you
>  >  know. But, it's not an XO problem.

I didn't actually look at the problem like Kelly did, but I can summarize:

* When talking to a database, you have to make sure that the database
itself is using the right encoding and that the connection to the
database is using the right encoding.  For MySQLdb, I like to
configure it to use UTF-8 and have it automatically take care of
encoding and decoding for me.

* Remember not to call str on unicode objects.

* Call print repr(obj) on whatever object you get back from the query
to see whether you have a string "foo" or a unicode object u"foo".

* Read http://wiki.pylonshq.com/display/pylonsdocs/Unicode.  At some
point, a few of us at the Pylons project did our best to explain this
stuff really well.

Best Regards,
-jj

-- 
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/


More information about the Baypiggies mailing list