[Baypiggies] Fwd: [sugar] Database Activity

Kelly Yancey kelly at nttmcl.com
Mon Mar 10 08:12:27 CET 2008


Edward Cherlin wrote:
> Python UTF-8 issue. Can anybody help?
> 

   With no code, hence relying on my psychic debugging prowless....

 >  Question 1:  Why does pysqlite convert the utf-8 sting to latin-1 in
 >  the *query process* when my sqlite settings are for utf-8?
 >  i.e., sqlite.h macro: #define SQLITE_UTF8 1

   It doesn't*.  The poster's code did (or a library he is calling did). 
  See answer to Question 2.

   * Unless you set the text_factory to something other than unicode,
     that is.

http://oss.itsystementwicklung.de/download/pysqlite/doc/usage-guide.html#text_factory

 >
 >  Question 2:  Does the pysqlite user (programmer) have any control
 >  over this query/conversion?  (I see nothing in the Python Database
 >  API, or the pysqlite/sqlite documentation, about either automatic or
 >  specified conversions.  Also, this conversion does not occur using
 >  the sqlite3 client.)
 >

   Yes, see footnote to Question 1.
   (I guess he didn't read the documentation very closely. :( )

   My experience is that if a unicode string is recorded to the 
database, it will be returned as a unicode string.
   Now if you use that unicode object in a string context somewhere, 
python will try to convert it based on your default encoding.  Which the 
original poster states is "ascii".  Look fishy?  I think so.  Note that 
the error is a Unicode*Encode*Error, not a Unicode*Decode*Error.  The 
string returned by pysqlite was unicode, the poster's code is trying to 
encode it into a non-unicode character set (ascii) somewhere.

 >  Question 3:  Is this a pysqlite bug or a feature, and why?"

   If my hunch is right, then it has nothing to do with pysqlite.  It is 
a python feature.  The original poster has a bug in their code.  No 
surprises here.

   How to confirm if my hunch is right:

   1.  Set the default encoding to 'utf-8'.  Run test program.  Does it
       die?  (The original poster seems to recognize that this would be a
       good way to identify where the problem lies but then refuses to
       actually try it?!?)
   2.  If the UnicodeEncodeError goes away, congratulations you have
       identified that the bug is somewhere in your code.  You got a
       perfectly-good unicode object back from the pysqlite and
       erroneously used it in a non-unicode string context.  Python
       tried to convert for you.  But you asked it to convert to 'ascii'
       which is can't do.  You got an exception.  Please do not post
       your code for us to find your bug for you.
   3.  Set the default encoding back to 'ascii' if you like.
   4.  Look a little higher up in your traceback to find *your* code.
       That is where you are erroniously using the unicode object in
       a string context.  You may be calling a third-party API that
       expects strings rather than unicode objects.  Read the docs.  Fix
       your code.  Rinse and repeat until all of your UnicodeEncodeError
       exceptions go away.
   5.  Audit your code for further similar bugs.

   The worst part about this question is that Christian Boos already 
gave the same answer (albeit more succinctly)!
http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00021.html

   Kelly

   P.S. I was feeling pretty generous when I started my free psychic
        debugging session.  But I swear I can't help but think this
        was all a ploy to boost osdir's ad views and now I'm quite
        irritable.

-- 
Kelly Yancey
http://kbyanc.blogspot.com/

> 
> ---------- Forwarded message ----------
> From: 7150 <linux.1 at litenverden.com>
> Date: Sat, Mar 8, 2008 at 4:02 PM
> Subject: Re: [sugar] Database Activity
> To: sugar at lists.laptop.org
> 
>  This will be my last post on this topic until I learn a bit more.
> 
>  It appears that pysqlite has some data integrity issues.
> 
>  Interesting link to pysqlite "bug" discussion:
>  http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00020.html
> 
>  ---
> 
>  "I have built sqlite databases containing utf-8 encoded text.  When I
>  access them using pysqlite, utf-8 codepoints appear to be converted to
>  latin-1."
> 
>  It does this.
> 
>  How to stop it?
> 
>  An answer on the list was
>  (http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00022.html):
> 
>   > This should be:
>   >
>   >      req.write(elem.encode('utf-8'))
>   >
>   > write() expects a str object, so what happened
>   > in your code was an implicit conversion of your
>   > unicode object to a str, doing the encoding using
>   > the default system encoding, here 'ascii'.
> 
>  The req.write stuff is Greek to me, but I'll see what I can find out
>  about it.
> 
>  Then from the same post:
> 
>  "So, upon reflection, my situation boils down to this:
> 
>  (1) A utf-8 encoded string in an sqlite database is queried using
>  pysqlite which returns a latin-1 string (a
>  gratuitous/silent/unspecified/unrequested conversion).
> 
>  . . .
> 
>  pysqlite QUESTIONS:
> 
>  Question 1:  Why does pysqlite convert the utf-8 sting to latin-1 in the
>  *query process* when my sqlite settings are for utf-8?
>  i.e., sqlite.h macro: #define SQLITE_UTF8 1
> 
>  Question 2:  Does the pysqlite user (programmer) have any control over
>  this query/conversion?  (I see nothing in the Python Database API, or
>  the pysqlite/sqlite documentation, about either automatic or specified
>  conversions.  Also, this conversion does not occur using the sqlite3
>  client.)
> 
>  Question 3:  Is this a pysqlite bug or a feature, and why?"
> 
>  The group had no further answer for the guy.
> 
>  ---
> 
>  Someone else has noted thet the SQLite3 client does not transform the
>  data.  If I learn how to query my UTF-8 database on the XO, I'll let you
>  know. But, it's not an XO problem.
> 
> 
> 
>  ---
> 
>  http://www.litenverden.org
>  _______________________________________________
>  Sugar mailing list
>  Sugar at lists.laptop.org
>  http://lists.laptop.org/listinfo/sugar
> 
> 
> 



More information about the Baypiggies mailing list